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Field of the Invention 

The present invention relates to methods, systems, and kits useful for 
the identification of molecules that specifically bind to defined nucleic acid 
sequences. Also described are methods for designing molecules having the 
20 ability to bind defined nucleic acid sequences and compositions thereof. 
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Background of the Invention 

20 Several classes of small molecules that interact with double-stranded DNA 

have been identified. Many of these small molecules have profound biological 
effects. For example, many aminoacridines and polycyclic hydrocarbons bind DNA 
and are mutagenic, teratogenic, or carcinogenic. Other small molecules that 
bind DNA include: biological metabolites, some of which have applications as 
25 antibiotics and antitumor agents including actinomycin D, echinomycin, 
distamycin, and calicheamicin; planar dyes, such as ethidium and acridine 
orange; and molecules that contain heavy metals, such as cisplatin, a potent 
antitumor drug. 

The sequence binding preferences of most known DNA binding molecules have 
%j 30 not, to date, been identified. However, several small DNA-binding molecules 
kQ have been shown to preferentially recognize specific nucleotide sequences, for 

example: echinomycin has been shown to preferentially bind the sequence 
[ (A/T) CGT] / [ACG (A/T) ] (Gilbert et al.); cisplatin has been shown to covalently 
cross-link a platinum molecule between the N7 atoms of two adjacent 
4= 35 deoxyguanosines (Sherman et al.}; and calicheamicin has been shown to 
gi preferentially bind and cleave the sequence TCCT/AGGA (Zein et al.). 

* Many therapeutic DNA-binding molecules (such as distamycin) that were 

~„ initially identified based on their therapeutic activity in a biological screen 

^= have been later determined to bind DNA. There are several examples in the 

.Li.. _40 - -literature - referring -to --synthetic -or —naturally-occurring — polymers- -of --DNA-- 
binding drugs. Netropsin, for example, is a naturally-occurring oligopeptide 
that binds to the minor groove of double-stranded DNA. Netropsin contains two 
1-U 4-amino-l-methylpyrrole-2-carboxylate residues and belongs to a family of 

Q similar biological metabolites from Strepto/nyces spp. This family includes 

12 45 distamycin, anthelvencin (both of which contain three N-methylpyrrole 
r " residues), noformycin, amidomycin (both of which contain one N-methylpyrrole 

residue) and kikumycin (which contains two N-methylpyrrole residues, like 
netropsin) (Debart, et al.). Synthetic molecules of this family have also been 
described, including the above-mentioned molecules (Lown, et al. 1985) well as 
50 dimeric derivatives (Griffin et al., Gurskii, et al.) and certain analogues 
(Bialer, et al. 1980, Bialer, et al . 1981, Krowicki, et al.). 

Molecules in this family, particularly netropsin and distamycin, have 
been of interest because of their biological activity as antibacterial (Thrum 
et al., Schuhmann, et al.), antiparasitic (Nakamura et al.), and antiviral 
55 drugs (Becker, et al., Lown, et al. 1986, Werner, et al.) . 

Among the synthetic analogs of netropsin and distamycin are oligopeptides 
that have been designed to have sequence preferences different from their 
parent molecules. Such oligopeptides include the "lexitropsin" series of 
analogues. The N-methlypyrrole groups of the netropsin series were 

60 systematically replaced with N-methylimidazole residues, resulting in 
lexitropsins with increased and altered sequence specificities from the parent 
compounds (Kissinger, et al.). Further, a number of poly (N-methylpyrrolyl ) - 
netropsin analogues have been designed and synthesized which extend the number 
of residues in the oligopeptides to increase the size of the binding site 
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{Dervan, 1986) . 

There are several different approaches that could be taken to look for 
small molecules that specifically inhibit the interaction of a given DNA- 
binding protein with its binding sequence (cognate site) . One approach would 
5 be to test biological or chemical compounds for their ability to preferentially 
block the binding of one specific DNA: protein interaction but not others. Such 
an assay would depend on the development of at least two, preferably three, 
DNA:protein interaction systems in order to establish controls for distinguish- 
ing between general DNA-binding molecules (polycations like heparin or 
10 intercalating agents like ethidiunt) and DNA-binding molecules having sequence 
binding preferences that would affect protein/cognate binding site interactions 
in one system but not the other (s) . 

One illustration of how this system could be used is as follows. Each 
cognate site could be placed 5' to a reporter gene (such as genes encoding [3- 
15 galactoside or lucif erase) such that binding of the protein to the cognate site 
would enhance transcription of the reporter gene. The presence of a sequence- 
specific DNA-binding drug that blocked the DNA: protein interaction would 
decrease the enhancement of the reporter gene expression. Several DNA 
enhancers could be coupled to reporter genes, then each construct compared to 
20 one another in the presence or absence of small DNA-binding test molecules. In 
the case where multiple protein/cognate binding sites are used for screening, a 
competitive inhibitor that blocks one interaction but not the others could be 
identified by the lack of transcription of a reporter gene in a transfected 
cell line or in an in vitro assay. Only one such DNA-binding sequence, 
25 specific for the protein of interest, could be screened with each assay system. 
This approach has a number of limitations including limited testing capability 
p and the need to construct the appropriate reporter system for each different 

.j= s protein/cognate site of interest. 

^jf Another example of a system to detect sequence-specific DNA-binding mole- 

MiJ 30 cules would involve cloning a DNA-binding protein of interest, expressing the 
yfl protein in an expression system (e.g., bacterial, baculovirus, or mammalian 

Ijj expression systems), preparing a purified or partially purified sample of 

r": protein, then using the protein in an in vitro competition assay to detect 

^ molecules that blocked the DNA: protein interaction. These types of systems are 

J« 35 analogous to many receptor : ligand or enzyme : substrate screening assays 
ffS developed in the past, but have the same limitations as outlined above in that 

'* " a new system must be developed for every different protein/cognate site 

f combination of interest. The capacity for screening numerous different 

\*k sequences is therefore limited. 

-W&-4-0 Another- -example- -of- a- system -designed- -to- -detect- sequence-specific- DNA- 

I J; binding drugs would be the use of DNA footprinting procedures as described in 

the literature. These methods include DNase I or other nuclease footprinting 
UJ (Chaires, et al.), hydroxy radical footprinting (Portugal, et al.), 

tj methidiumpropyl EDTA(iron) complex footprinting (Schultz, et al.), 

\J h 4 5 photofootprinting (Jeppesen, et al.) , and bidirectional transcription 
r * footprinting (White, et al.). These procedures are likely to be accurate 

within the limits of their sequence testing capability but are seriously 
limited by (i) the number of different DNA sequences that can be used in one 
experiment (typically one test sequence that represents the binding site of the 
50 DNA-binding protein under study), and (ii) the difficulty of developing high 
throughput screening systems. 

Summary of the Invention 

In one aspect, the invention includes a method of constructing a DNA- 
55 binding agent capable of sequence-specific binding to a duplex DNA target 
region. The method includes identifying in the duplex DNA, a target region 
containing a series of at least two non-overlapping base-pair sequences of four 
base-pairs each, where the four base-pair sequences are adjacent, and each 
sequence is characterized by sequence-preferential binding to a duplex DNA- 
60 binding small molecule. The small molecules are coupled to form a DNA-binding 
agent capable of sequence-specific binding to said target region. 

In one embodiment, the duplex-binding small molecules are identified as 
molecules capable of binding to a selected test sequence in a duplex DNA by 
first adding a molecule to be screened to a test system composed of (a) a DNA- 
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binding protein that is effective to bind to a screening sequence in a duplex 
DNA, with a binding affinity that is substantially independent of the test 
sequence adjacent the screening sequence, but that is sensitive to binding of 
molecules to such test sequence, when the test sequence is adjacent the 
5 screening sequence, and (b) a duplex DNA having said screening and test 
sequences adjacent one another, where the binding protein is present in an 
amount that saturates the screening sequence in the duplex DNA. 

The test molecule is incubated in the test system for a period sufficient 
to permit binding of the molecule being tested to the test sequence in the 
10 duplex DNA. The degree of binding protein bound to the duplex DNA before 
adding the test molecule is compared with that after adding the molecule. The 
screening sequence may be from the HSV origin of replication, and the binding 
protein may be UL9. Exemplary screening sequences are identified as SEQ ID 
NO: 601, SEQ ID NO: 602, SEQ ID NO: 615, and SEQ ID NO: 641. 
15 Specific examples of tetrameric basepair sequences include TTTC, TTTG, 

TTAC, TTAG, TTGC, TTGG, TTCC, TTCG, TATC, TATG, TAAC, TAAG, TAGC, TAGG, TACC, 
TAGC sequences. A specific example of a small molecule capable of binding to 
these sequences is distamycin. 

In another aspect, the invention includes a method of blocking 
20 transcriptional activity from a duplex DNA template. The method includes 
identifying in the duplex DNA, a binding site for a transcription factor and, 
adjacent the binding site, a target region having a series of at least two .non- 
overlapping tetrameric base-pair sequences, where the four (tetrameric) base- 
pair sequences are adjacent and each sequence is characterized by sequence- 
25 preferential binding to a duplex DNA-binding small molecule. The sequences are 
contacted with a binding agent composed of the small molecules coupled to form 
a DNA-binding agent capable of sequence-specific binding to said target region. 

The target may be selected, for example, from DNA sequences adjacent a 
^ binding site for a eucaryotic transcription factor, such as transcription 

41 30 factor TFIID, or a procaryotic transcription factor, such as transcription 
i.h sigma factor. , , 

r\ For mammalian transcription factors, the target region is typically 

— chosen from non-conserved regions adjacent the transcription factor binding 
site. Target regions can be chosen so that the small molecule binding overlaps 

i= 35 an adjacent transcription factor DNA binding sequence (e.g., for a TFIID 
binding site, by 1-3 nucleotide pairs) . In this case, the specificity of DNA 
binding for the small molecule is essentially derived from the non-conserved 

- sequences adjacent the transcription factor binding site, in order to reduce 
M small molecule binding at the transcription factor binding site associated with 

.4 q. - -other-genes-. - - - — - - - - - - r .t .- - - , - 

r " Also disclosed is a DNA-binding agent capable of binding with base- 

H sequence specificity to a target region in duplex DNA, wjiere the target region 

Id contains at least two adjacent four base-pair sequences. The agent includes at 

S~ least two subunits, where each subunit is a small molecule which has a 

H 4 5 sequence-preferential binding affinity for a sequence of four base-pairs in the 
target region. The subunits are coupled to form a DNA-binding agent capable of 
sequence-specific binding to said target region. 

In one general embodiment, the agent is designed for binding to a 
sequence in which the two tetrameric basepair sequences are separated {for 
50 example, by up to 20 basepairs, typically, 1 to 6 basepairs) and the small 
molecules in the agent are coupled to each other by a spacer molecule. 

Also forming part of the invention is a method of constructing a binding 
agent capable of sequence-specific binding to a duplex DNA target region. The 
method includes identifying in the duplex DNA, a target region containing (l) a 
55 series of at least two adjacent non-overlapping base-pair sequences of four 
base-pairs each, where each four base-pair sequence is characterized by 
sequence-preferential biding to a duplex DNA-binding small molecule, and (n) 
adjacent to (i) a DNA duplex region capable of forming a triplex with a third- 
strand oligonucleotide. The two small molecules are coupled to form a DNA- 
60 binding agent capable of sequence-specific binding to said target region, and 
the DNA-binding agent is attached to a third-strand oligonucleotide. 

The binding of the DNA-binding agent to duplex DNA causes a shift from B 
form to A form DNA, allowing triplex binding between the third-strand 
polynucleotide and a portion of the target sequence. 

65 
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Also disclosed is a triple-strand forming agent for use in practicing the 
method. 

In still another aspect, the invention includes a method of ordering the 
sequence binding preferences a DNA-binding molecule. The method includes 
5 adding a molecule to be screened to a test system composed of {a) a DNA-binding 
protein that is effective to bind to a screening sequence in a duplex DNA with 
a binding affinity that is substantially independent of such test sequence 
adjacent the screening sequence, but that is sensitive to binding of molecules 
to such test sequence, and (b) a duplex DNA having said screening and test 

10 sequences adjacent one another, where the binding protein is present in an 
amount that saturates the screening sequence in the duplex DNA. The molecule 
in the test system is incubated for a period sufficient to permit binding of 
the molecule being tested to the test sequence in the duplex DNA, and the 
amount of binding protein bound to the duplex DNA before and after addition of 

15 the test molecule is compared. These steps are repeated using all test 
sequences of interest, and the sequences are then ordered on the basis of 
relative amounts of protein bound in the presence of the molecule for each test 
sequence. 

The test sequences are selected, for example, from the group of 25 6 
20 possible four base sequences composed of A, G, C and T. The DNA screening 
sequence is preferably from the HSV origin of replication, and the binding 
protein is preferably UL9. 

The invention also includes, a method for altering the binding 
characteristics of a DNA-binding protein to a duplex DNA. In the method, a 
25 binding site for the DNA-binding protein is identified in the duplex DNA and a 
target region identified adjacent the binding site. A small molecule is 
i=~ selected that is characterized by sequence-preferential binding to the target 

^ region. Such molecules can be selected by the 'assay and methods of the present 

y!3 invention. " When the small molecule is bound to the target region, the small 

:Pi 30 molecule is typically adjacent to the binding site for the DNA-bi'nding protein. 
% Alternatively, the binding of the small molecule may overlapping the site for 

. the DNA-binding protein by at least one nucleotide pair. In the case of such 

overlap, the specificity of DNA binding for the small molecule is essentially 
LJ derived from non-conserved sequences adjacent the DNA-binding protein's binding 

= 5= 35 site — in order to reduce small molecule binding at similar DNA -.protein 

binding sites at other locations. Finally, 
% l the duplex DNA is contacted with the small molecule at a concentration 

n effective to alter binding of the DNA-binding protein to its binding site. 

Li In this method, contacting the duplex DNA with a small molecule can 

_["! _ 40_ - either, inhibit, or enhance the binding of the DNA-binding _prqtein_ tq_ it^-^^A 1 !^. 

site: depending on the small molecule that is selected. Exemplary DNA binding 
M proteins include DNA replication factors and a variety of transcription 

\\\ factors. 

ZZ One application of this method is to eucaryotic general transcription 

u 45 factors {e.g., TFIID) , where the target region is typically selected from DNA 
H sequences adjacent the binding site for the eucaryotic transcription factor 

(e.g., SEQ ID NO:l to SEQ ID NO:600). In one embodiment, the DNA binding 
protein is a eucaryotic general transcription factor and the small molecule 
binds, in addition to the target region, 1 to three nucleotide pairs of the 
50 DNA-binding protein's binding site. In the case of TFIID, the small molecule 
typically binds to (i) the target region, and (ii) up to two nucleotides of the 
binding site for TFIID, where the nucleotides are contiguous to the target 
region. 

Generally, the present invention provides a method of screening for 
55 molecules capable of binding to a selected test sequence in a duplex DNA. In 
the method of the present invention a test sequence of interest is selected. 
Such sequences can be selected, for example, from the group of sequences 
presented as SEQ ID NO:l to SEQ ID NO: 600. Alternatively, the test sequences 
can be sequences having randomly generated sequences or defined sets of 
60 sequences, such as, the group of 256 possible four base sequences composed of 
A, G, C and T. 

A duplex DNA test oligonucleotide is constructed having a screening 
sequence adjacent a selected test sequence, where a DNA binding protein is 
effective to bind to the screening sequence with a binding affinity that is 
65 substantially independent of the adjacent test sequence. In such constructs 
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the DNA protein binding to the screening sequence is sensitive to binding of 
test molecules to the test sequence. 

Molecules selected for testing/screening are added to a test system 
composed of (a) the DNA binding protein, and (b) the duplex DNA test 
5 oligonucleotide, which contains the screening and test sequences adjacent one 
another. Selected molecules are incubated in the test system for a period 
sufficient to permit binding of the molecule being tested to the test sequence 
in the duplex DNA. The amount of binding protein bound to the duplex DNA is 
compared before and after adding a test molecule. Comparison of the amount of 
10 binding protein bound to the duplex DNA before and after adding a test molecule 
can be accomplished, for example, using a gel band-shift assay or a filter- 
binding assay. 

In the method of the present invention a number of DNA: protein 
interactions may be used for screening purposes. In one embodiment, the DNA 
15 screening sequence is from the HSV origin of replication and the binding 
protein is UL9. Exemplary HSV origin of replication screening sequences 
include SEQ ID NO: 601, SEQ ID NO: 602, SEQ ID NO: 615, and SEQ ID NO: 641. 

Other DNA:protein interactions useful in the practice of the present 
invention include restriction endonucleases and their cognate DNA-binding 
20 sequences. These reactions are typically carried out in the absence of 
divalent cations. 

In another embodiment, the invention includes a method of identifying 
test sequences in duplex DNA to which binding of a test molecule is most 
preferred. In this method a mixture of duplex DNA test oligonucleotides is 
25 constructed, where each oligonucleotide has a screening sequence adjacent a 
test sequence as described above. The test oligonucleotides of the mixture 
O typically contain different test sequences. 

fj^ A test molecule, to be screened, is added to a test reaction composed of 

(a) the DNA binding protein, and (b) the duplex DNA test oligonucleotide 
'42 30 mixture. The molecule is incubated in the test reaction for a period 
y0 sufficient to permit binding of the compound being tested to test sequences in 

\~\ the duplex DNA. Test oligonucleotides are separated from test oligonucleotides 

bound to binding protein. 

The test oligonucleotides can be separated from test oligonucleotides 
4» 35 bound to protein by, for example, passing the test reaction through a filter, 
fh where the filter is capable of capturing DNA: protein complexes but not DNA that 

" : is free of protein. One filter type useful in the practice of the present 

3 invention is the nitrocellulose filter. 

The separated test oligonucleotides are then amplified. These amplified 
4 0 te_st oligonucleotides are then , recycled through the screening steps of the 
assay in order to obtain a desired degree of selection. The amplified test 
oligonucleotides are isolated and sequenced. 

Exemplary test sequences include sequences selected from the group of 256 
possible four base sequences composed of A, G, C and T. Further examples of 
45 desirable test sequences include test sequences derived from the .sequences 
presented as SEQ ID N0:1 to SEQ ID NO: 600. 

The amplification step in the method may be accomplished by polymerase 
chain reaction or other methods of amplification, including, cloning and 
subsequent in vivo amplification of the cloning vector containing the sequences 
50 of interest. 

These and other objects and features of the invention will be more fully 
appreciated when the following detailed description of the invention is read in 
conjunction with the accompanying drawings. 

55 Brief Description of the Figures 

Figure 1A illustrates a DNA-binding protein binding to a screening se- 
quence. Figures IB and 1C illustrate how a DNA-binding protein may be 
displaced or hindered in binding by a small molecule by two different 
mechanisms: because of stearic hinderance {IB) or because of conformational 
60 (allosteric} changes induced in the DNA by a small molecule (1C) . 

Figure 2 illustrates an assay for detecting inhibitory molecules based on 
their ability to preferentially hinder the binding of a DNA-binding protein to 
its binding site. Protein (O) is displaced from DNA (/) in the presence of 
inhibitor (X) . Two alternative capture/detection systems are illustrated, the 
65 capture and detection of unbound DNA or the capture and detection of 
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DNA: protein complexes. 

Figure 3 shows a DNA-binding protein that is able to protect a biotin 
moiety, covalently attached to an oligonucleotide sequence, from being 
recognized by streptavidin when a protein is bound to the DNA. 
5 Figure 4 shows the incorporation of biotin and digoxigenin into a typical 

oligonucleotide molecule for use in the assay of the present invention. The 
oligonucleotide contains the binding sequence (i.e., the screening sequence) of 
the.UL9 protein, which is underlined, and test sequences flanking the screening 
sequence. Figure 4 also shows the preparation of double-stranded oligonucleo- 
10 tides end-labeled with either digoxigenin or 32 P. 

Figure 5 shows a series of sequences that have been tested in the assay 
of the present invention for the binding of sequence-specific small molecules. 

Figure 6 outlines the clonings, into an expression vector, of a truncated 
form of the UL9 protein (UL9-C00H) which retains its sequence-specific DNA- 
15 binding ability. 

Figure 7 shows the pVL1393 baculovirus vector containing the full length 
UL9 protein coding sequence. 

Figure 8 is a photograph of a SDS-polyacrylamide gel showing (i) the 
purified UL9-COOH/glutathione-S-transferase fusion protein and (ii) the UL9- 
20 COOH polypeptide. 

Figure 9 presents data demonstrating the effect on UL9-COOH binding of 
alterations in the test sequences that flank the UL9 screening sequence. 

Figure 10A shows the effect of the addition of several concentrations of 
distamycin A to DNA:protein assay reactions utilizing different test sequences. 



25 Figure 10B shows the effect of the addition of actinomycin D to DNArprotein 
assay reactions utilizing different test sequences. Figure 10C shows the 
O effect of the addition of Doxorubicin to DNA:protein assay reactions utilizing 

■Ai different test sequences. 

/j* Figure 11A illustrates a DNA capture system of the present invention 

30 utilizing biotin and streptavidin coated magnetic beads. The presence of the 
l j3 . DNA is detected using an alkaline-phosphatase substrate that yields a 
\ { \ chemiluminescent product. Figure 11B shows a similar reaction using biotin 

coated agarose beads that are conjugated to streptavidin, that in turn is 
*"lt conjugated to the captured DNA. 

sj* 35 Figure 12 demonstrates a test matrix based on DNA: protein-binding data. 



1*1-1 Figure 13 lists the top strands (5 f -3*) of all the possible four base 

pair sequences that could be used as a defined set of ordered test sequences in 
3 the assay. 

H Figure 14A lists the top strands ( 5 ' -3 1 ) of all the possible four base 

kri- 4 0 - pair- sequences that have- the same -base— composition as- the sequence -5 --GATC-3 '-. - 
s iJL This is another example of a defined, ordered set of sequences that could be 

f~ tested in the assay. Figure 14B presents the general sequence of a test oligo- 

nucleotide (SEQ ID NO:617), where XXXX is the test sequence and N = A,G,C, or 

TI 4 5 Figure 15 shows the results of 4 duplicate experiments in which the 

binding activity of distamycin was tested with all possible (256) four base 
pair sequences. The oligonucleotides are ranked from 1 to 256 (column 1, 
"rank") based on their average rank from the four experiments (column 13, "ave. 
rank") . (rank is shown in the first column of the chart) . 
50 Figure 16 shows the average ranks (Figure 15) plotted against the ideal 

ranks 1 to 256. 

Figure 17 shows the average r% scores (Figure 15) plotted against the 
rank of 1 to 256. 

Figure 18 shows the results of eight experiments with actinomycin D. The 
55 r% scores and rank are shown for each of the 256 oligonucleotides. 

Figure 19 shows the average r% versus rank, by average rank (data from 
Figure 18) . 

Figure 20 shows the ideal and average ranks for each of the 256 oligonu- 
cleotides . 

60 Figure 21 shows the results of a position analysis for actinomycin D 

preference . 

Figure 22 presents the data for a dinucleotide analysis of actinomycin D 
binding preference. 

Figure 23 graphically displays the results presented in Figure 22. 
65 Figure 24 graphically displays the data presented in Figure 22, where the 



data are combined in a combined bar chart so that the cumulative results for 
any dinucleotide pair are tabulated in a single bar. 

Figure 25 shows the top strands of 16 possible duplex DNA target sites 
for binding bis-distamycins . 

Figure 26 shows examples of bis -distamycin target sequences for bis- 
distamycins with internal flexible and/or variable length linkers targeted to 
sites comprised of two TTCC sequences, where N is any base. 

Figures 27A to 27H show sample oligonucleotides for competition binding 
studies using the assay of the present invention. 

Figure 28 shows the DNA sequences of the HIV pro-viral promoter region. 
Several transcription factor binding sites are marked. 

Figures 29A to 29D illustrate sample test oligonucleotides for use in the 
polymerase chain reaction based selection technique of the present invention. 
In Figure 29A, X is the number of bases that comprise the test site. 

Figure 30 illustrates a sample test oligonucleotide for use in the assay 
of the present invention, where the test oligonucleotide employs several 
different DNA: protein interaction systems. 

Figure 31 illustrates the results of screening a selected test sequence 
with a single DNA:protein interaction system. In the figure, the test site is 
shown in bold, the potential binding site for the test molecule is underlined. 

Figure 32 illustrates the results of screening the same selected test 
sequence as shown in Figure 31, but using a different single DNA: protein 
interaction system. In the figure, the test site is shown in bold, the 
potential binding site for the test molecule is underlined. 

Detailed Description of the Invention 

I . Definitions : 

Adjacent is used to describe the distance relationship between two 
neighboring sites. Adjacent sites are 20 or less bp apart, and can be 
separated by any fewer number of bases including the situation where the sites 
are immediately abutting one another. "Flanking" is a synonym for adjacent. 

Bound DNA , as used in this disclosure, refers to the DNA that is bound by 
the protein used in the assay (e.g., a test oligonucleotide containing the UL9 
binding sequence bound to the UL9 protein. 

Coding sequences or coding regions are DNA sequences that code for RNA 
transcripts, unless specified otherwise. 

Dissociation is the process by which two molecules cease to interact: 
the process occurs at a fixed average rate under specific physical conditions. 

Functional binding is the noncovalent association of a protein or small 
mbTecu'le' "to the" DNA "molecule .- --In- one -embodiment- -of- -the- assay- of -the -present. _.. 
invention the functional binding of the UL9 protein to a screening sequence 
(i.e., its cognate DNA binding site) has been evaluated using filter binding or 
gel band-shift experiments. 

Half-life is herein defined as the time required for one-half of the 
associated complexes, e.g., DNArprotein complexes, to dissociate. 

Heteropoiymers are molecules comprised of at least two different 
subunits, each representing a different type or class of molecule. The 
covalent coupling of different subunits, such as, DNA-binding molecules or 
portions of DNA-binding molecules, results in the formation of a heteropolymer : 
for example, the coupling of a non-intercalating homopolymeric DNA-binding 
molecule, such as distamycin, to an intercalating drug, such as daunomycin. 
Likewise, the coupling of netropsin, which is essentially a molecular subunit 
of distamycin, to daunomycin would also be a heteropolymer. As a further 
example, the coupling of distamycin, netropsin, or daunomycin to a DNA-binding 
homopolymer, such as a triplex- forming oligonucleotide, would- result in a 
heteropolymer. 

Homopolymers are molecules that are comprised of a repeating subunit of 
the same type or class. Two examples of duplex DNA-binding homopolymers are as 
follows: (i) triplex-forming oligonucleotides or oligonucleotide analogs, 
which are composed of repeating subunits of nucleotides or nucleotide analogs, 
and (ii) oligopeptides, which are composed of repeating subunits linked by 
peptide bonds {e.g., distamycin, netropsin). 

Sequence-preferential binding refers to DNA binding molecules that 
generally bind DNA but that show preference for binding to some DNA sequences 
over others. Sequence-preferential binding is typified by several of the small 
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molecules tested in the present disclosure, e.g., distamycin. Sequence- 
preferential and sequence-specific binding can be evaluated using a test matrix 
such as is presented in Figure 12, For a given DNA-binding molecule/ there are 
a spectrum of differential affinities for different DNA sequences ranging from 
5 non-sequence-specific (no detectable preference) to sequence preferential to 
absolute sequence specificity (i.e., the recognition of only a single sequence 
among all possible sequences, as is the case with many restriction 
endonucleases) . 

Sequence-specific binding refers to DNA binding molecules which have a 
10 strong DNA sequence binding preference. For example, the following demonstrate 
typical sequence-specific DNA-binding: (i) multimers (heteropolymers and 
homopolymers) of the present invention (e.g., Section IV.E.l, Multimerization; 
Example 13), and (ii) restriction enzymes and the proteins listed in Table IV. 

Screening sequence is the DNA sequence that defines the cognate binding 
15 site for the DNA binding protein: in the case of UL9, the screening sequence 
can, for example, be SEQ ID NO: 601. 

Small molecules are desirable as therapeutics for several reasons related 
to drug delivery, including the following: (i) they are commonly less than 10 
K molecular weight; (ii) they are more likely to be permeable to cells; (iii) 
20 unlike peptides or oligonucleotides, they are less susceptible to degradation 
by many cellular mechanisms; and, (iv) they are not as apt to elicit an immune 
response. Many pharmaceutical companies have extensive libraries of chemical 
and/or biological mixtures, often fungal, bacterial, or algal extracts, that 
would be desirable to screen with the assay of the present invention. Small 
25 molecules may be either biological or synthetic organic compounds, or even 
inorganic compounds (i.e., cisplatin) . 

Test sequence is a DNA sequence adjacent the screening sequence. The 
assay of the present invention screens for molecules that, when bound to the 
test sequence, affect the interaction of the DNA-binding protein with its 
^ 30 cognate binding site (i.e., the screening sequence). Test sequences can be 
*J2 placed adjacent either or both ends of the screening sequence. Typically, 

ill binding of molecules to the test sequence interferes with the binding of the 

DNA-binding protein to the screening sequence. However, some molecules binding 
to these sequences may have the reverse effect, causing an increased binding 
J~ 35 affinity of the DNA-binding protein to the screening sequence. Some molecules, 
m even while binding in a sequence specific or sequence preferential manner, 

might have no effect in the assay. These molecules would not be detected in 
2 the assay. 

M Unbound DNA , as used in this disclosure, refers to the DNA that is not 

LTr ~40 — bound -by — the -protein --used- -in- —the — assay. -( i..e. , in_ the__examples_ of t.his_ 

I.^ disclosure, the UL9 protein) . 



II. The Assay . 

One feature of the present invention is that it provides an assay to 

4 5 identify small molecules that will bind in a sequence-specific manner to 
medically significant DNA target sites. The assay facilitates the development 
of a new field of pharmaceuticals that operates by interfering with specific 
DNA functions, such as crucial DNArprotein interactions. A sensitive, well- 
controlled assay has been developmed (i) to detect DNA-binding molecules and 

50 (ii) to determine their sequence-specificity and affinity. The assay can be 
used to screen large biological and chemical libraries. For example, the assay 
will be used to detect sequence-specific DNA-binding molecules in fermentation 
broths or extracts from various microorganisms. 

Furthermore, another application for the assay is to determine the se- 

55 quence specificity and relative affinities of known DNA-binding drugs (and 
other DNA-binding molecules) for different DNA sequences.. Such drugs, which 
are currently used primarily as antibiotics or anticancer drugs, may have 
previously unidentified activities that make them strong candidates for 
therapeutics or therapeutic precursors in entirely different areas of medicine. 

60 The use of the assay to determine the sequence-binding preference of these 
known DNA-binding molecules enables the rational design of novel DNA-binding 
molecules with enhanced sequence-binding preference. The methods for designing 
and testing these novel DNA-binding molecules is described below. 



13 



The screening assay of the present invention is basically a competition 
assay that is designed to test the ability of a test molecule to compete with a 
DNA-binding protein for binding to a short, synthetic, double-stranded 
pligodeoxynucleotide that contains the recognition sequence for the DNA-binding 
5 protein flanked on either or both sides by a variable test site. The variable 
test site may contain any DNA sequence that provides a reasonable recognition 
sequence for a DNA-binding test molecule. Molecules that bind to the test site 
alter the binding characteristics of the protein in a manner that can be 
readily detected. The extent to which such molecules are able to alter the 
10 binding characteristics of the protein is likely to be directly proportional to 
the affinity of the test molecule for the DNA test site. The relative affinity 
of a given molecule for different oligonucleotide sequences at the test site 
(i.e., test sequences) can be established by examining the molecule's effect on 
the DNA:protein interaction using each of the test sequences. 
15 The assay can be used to test specific target sequences and to identify 

novel DNA-binding molecules. Also, the assay provides a means for the 
determination of the high affinity DNA binding sites for a given DNA-binding 
molecule, thus facilitating the identification of specific target sequences. 

20 A. General Considerations . 

The assay of the present invention has been designed for detecting test 
molecules or compounds that affect the rate of transfer of a specific DNA mole- 
cule from one protein molecule to another identical protein in solution. 

A mixture of DNA and protein is prepared in solution. The concentration 
25 of protein is in excess to the concentration of the DNA so that virtually all 
of the DNA is found in DNA: protein complexes. The DNA is a double-stranded 
oligonucleotide that contains the recognition sequence for a specific DNA- 
binding protein {i.e., the screening sequence).' The protein used in the assay 
contains a ' DNA-binding domain that is specific for binding to the sequence 
^ 30 within the oligonucleotide. The physical conditions of the solution (e.g., pH, 
iJ3 salt concentration, temperature) are adjusted such that the half-life of the 

iii complex is amenable to performing the assay (optimally a half-life of 5-120 

minutes), preferably in a range that is close to normal physiological 
~~ z conditions. 

j~ 35 as one DNA: protein complex dissociates, the released DNA rapidly reforms 

m a complex with another protein in solution. Since the protein is in excess to 

~" the DNA, dissociations of one complex always result in the rapid reassociation 

5 of the DNA into another DNA: protein complex. At equilibrium, very few DNA 

H molecules will be unbound. If the unbound DNA is the component of the system 

4 0 -that, is- measured, _the. minimum background, of _ the .assay is__the amount_ of unbound 
j_a DNA observed during any given measurable time period. If the capture/detection 

system used for capturing the unbound DNA is irreversible, the brevity of the 
y=! observation period (the length of time used to capture the unbound DNA) and the 

O sensitivity of the detection system define the lower limits of background DNA. 

45 Figure 1 illustrates how (i) such a protein can be displaced from its 

r * cognate binding site, (ii) a protein can be prevented from binding its cognate 

binding site, and (iii) how the kinetics of the DNA:protein interaction can be 
altered. In each case, the binding site for the test molecule is located at a 
site flanking the recognition sequence for the DNA-binding protein (Figure 1A) . 
50 One mechanism is stearic hinderance of protein binding by a small molecule 
(competitive inhibition; Figure IB) . Alternatively, a molecule may interfere 
with a DNA: protein binding interaction by inducing a conformational change in 
the DNA (allosteric interference, noncompetitive inhibition; Figure 1C) . In 
either event, if a test molecule that binds the oligonucleotide hinders binding 
55 of the protein, even transiently, the rate of transfer of DNA from one protein 
to another will be decreased. This will result in a net increase in the amount 
of unbound DNA and a net decrease in the amount of protein-bound DNA. In other 
words, an increase in the amount of unbound DNA or a decrease in the amount of 
bound DNA indicates the presence of an inhibitor, regardless of the mechanism 
60 of inhibition (competitive or noncompetitive) . 

Alternatively, molecules may be isolated that, when bound to the DNA, 
cause an increased affinity of the DNA-binding protein for its cognate binding 
site. In this case, the assay control samples (no drug added) are adjusted to 
less than 100% DNA:protein complex so that the increase in binding can be 
65 detected. The amount of unbound DNA (observed during a given measurable time 
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period after the addition of the molecule) will decrease and the amount of 
bound DNA will increase in the reaction mixture as detected by the cap- 
ture/detection system described in Section II. 

5 B. Choosing and Testing an Appropriate DNA-Bindinq Protein . 

Experiments performed in support of the present invention have defined an 
approach for identifying molecules having sequence-preferential DNA-binding. 
In this approach small molecules binding to sequences adjacent the cognate 
binding sequence can inhibit the protein/cognate DNA interaction. This assay 
10 has been designed to use a single DNA: protein interaction to screen for 
sequence-specific or sequence-preferential DNA-binding molecules that recognize 
virtually any sequence. 

While DNA-binding recognition sites are usually quite small (4-17 bp), 
the sequence that is protected by the binding protein is larger {usually 5 bp 
15 or more on either side of the recognition sequence — as detected by DNAase I 
protection (Galas, et al.) or methylation interference (Siebenlist, et al.). 

Experiments performed in support of the present invention demonstrated 
that a single protein and its cognate DNA-binding sequence can be used to assay 
virtually any DNA sequence by placing a sequence of interest adjacent to the 
20 cognate site: a small molecule bound to the adjacent site can be detected by 
alterations in the binding characteristics of the protein to its cognate site. 

Such alterations might occur by either stearic hindrance (which would cause 
the dissociation of the protein) or induced conformational changes in the 
recognition sequence for the protein (which may cause either enhanced binding 
25 or, more likely, decreased binding of the protein to its cognate site) . 

Q 1 . Criteria for Choosing an Appropriate DNA-Bindinq Protein . 

■M There are several considerations involved in choosing DNA: protein 

j\ complexes that can be employed in the assay of the present invention including: 

"~f 30 a.) The half-life of the DNA:protein complex should be short 

^ enough to accomplish the assay in a reasonable amount of time. The 

W interactions of some proteins with their cognate binding sites in DNA can be 

iii measured in days not minutes: such tightly bound complexes would 

inconveniently lengthen the period of time it takes to perform the assay. 
4* 35 b.) The half-life of the complex should be long enough to allow 

'Ql the measurement of unbound DNA in a reasonable amount of time. For example, 

the level of free DNA is dictated by the ratio between the time needed to 
measure free DNA and the amount of free DNA that occurs naturally due to the 
dissociation of the complex during the measurement time period. 

__!«&— 4..Q __j_ n . _ view „ of .„ the _ -above— two— considerations,- -practical— useful— DNA:-protein- 

U half-lives fall in the range of approximately two minutes to several days: 

\ \ shorter half-lives may be accommodated by faster equipment and longer half- 

lives may be accommodated by destabilizing the binding conditions for the 
assay. 

4 5 c.) A further consideration is that the kinetic interactions of 

the DNA: protein complex is relatively insensitive to the nucleotide sequences 
flanking the recognition sequence. The affinity of DNA-binding proteins may 
be affected by differences in the sequences adjacent to the recognition se- 
quence. If the half-life of the complex is affected by the flanking sequence, 

50 the analysis of comparative binding data between different flanking 
oligonucleotide sequences becomes difficult but is not impossible. 

2) Testing DNA: Protein Interactions for Use in the Assay . 

a. ) Other DNA: Protein Interactions Useful in the Method of the 

55 Present Invention . There are many known DNA: protein interactions that may be 
useful in the practice of the present invention, including (i) the DNA protein 
interactions listed in Table IV, (ii) bacterial, yeast, and phage systems such 
as lambda o L -o R /cro, and (iii) modified restriction enzyme systems (e.g., 
protein binding in the absence of divalent cations, see Section IV). Any 

60 protein that binds to a specific recognition sequence may be useful in the 
present invention. One constraining factor is the effect of the immediately 
adjacent sequences {the test sequences) on the affinity of the protein for its 
recognition sequence. DNA: protein interactions in which there is little or no 
effect of the test sequences on the affinity of the protein for its cognate 

65 site are preferable for use in the described assay; however, DNA:protein 
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interactions that exhibit test-sequence-dependent differential binding may 
still be useful if algorithms that compensate for the differential affinity are 
applied to the analysis of data. In general, the effect of flanking sequence 
composition on the binding of the protein is likely to be correlated to the 
5 length of the recognition sequence for the DNA-binding protein. That is, the 
kinetics of binding for proteins with shorter recognition sequences are more 
likely to suffer from flanking sequence effects, while the kinetics of binding 
for proteins with longer recognition sequences are more likely to not be 
affected by flanking sequence composition. The present disclosure provides 
10 methods and guidance for testing the usefulness of such DNA: protein 
interactions, in the screening assay. 

b. ) The Use of UL9 Proteins in the Practice of the Present 

Invention . 

Experiments performed in support of the present invention have identified 
15 a DNA:protein interaction that is particularly useful for the above described 
assay: the Herpes Simplex Virus (HSV) UL9 protein that binds the HSV origin of 
replication (oriS) . The UL9 protein has fairly stringent sequence specificity. 

There appear to be three binding sites for UL9 in oriS, SEQ ID NO: 601, SEQ ID 
NO:602 and SEQ ID NO:615 (Elias, et al . ; Stow, et al.). One sequence (SEQ ID 
20 NO: 601) binds with at least 10-fold higher affinity than the second sequence 
(SEQ ID NO: 602): the embodiments described below use the higher affinity 
binding site (SEQ ID NO: 601). Another useful UL9-binding site, alibi a lower 
affinity binding site, SEQ ID NO: 641, has also been identified. 

DNA: protein association reactions are performed in solution. The 
25 DNArprotein complexes can be separated from free DNA by any of several methods. 
One particularly useful method for the initial study of DNA:protein 
j*3 interactions has been visualization of binding results using band shift gels 

^ (Example 3A) . In this method DNA: protein binding reactions are applied to 

*Z polyacrylamide/TBE gels and the labelled complexes and free labeled DNA are 

30 separated electrophoretically . These gels are fixed, dried, and exposed to X- 
4l ray film. The resulting autoradiograms are examined for the amount of free 

jjj probe that is migrating separately from the DNA: protein complex. These assays 

include (i) a lane containing only free labeled probe, and (ii) a lane where 
the sample is labeled probe in the presence of a large excess of binding 
4- 35 protein. The band shift assays allow visualization of the ratios between 
£j~i DNA: protein complexes and free probe. However, they are less accurate than 

" filter binding assays for rate-determining experiments due to the lag time 

" between loading the gel and electrophoretic separation of the components. 

§*- The filter binding method is particularly useful in determining the half- 

%-:r 4 0 li-f e— f or -oligonucleotide: protein -complexes- (-Example. -3B) — In the .fil.ter_.binding_ 
assay, DNA:protein complexes are retained on a filter while free DNA passes 
f" through the filter. This assay method is more accurate for half -life 

determinations because the separation of DNArprotein complexes from free probe 
O is very rapid. The disadvantage of filter binding is that the nature of the 

t£ 4 5 DNA: protein complex cannot be directly visualized. So if, for example, the 
r " competing molecule was also a protein competing for the binding of a site on 

the DNA molecule, filter binding assays cannot differentiate between the 
binding of the two proteins nor yield information about whether one or both 
proteins are binding. 

50 

c. Preparation of Full Length UL9 and UL9-COOH Polypeptides . 

UL9 protein has been prepared by a number of recombinant techniques 
(Example 2) . The full length UL9 protein has been prepared from baculovirus 
infected insect cultures (Example 3A, B, and C) . Further, a portion of the UL9 

55 protein that contains the DNA-binding domain (UL9-C0OH) has been cloned into a 
bacterial expression vector and produced by bacterial cells (Example 3D and E) . 

The DNA-binding domain of UL9 is contained within the C-terminal 317 amino 
acids of the protein (Weir, et al.). The UL9-C00H polypeptide was inserted 
into the expression vector in-frame with the glutathione-S-transf erase (gst) 

60 protein. The gst/UL9 fusion protein was purified using affinity chromatography 
(Example 3E) . The vector also contained a thrombin cleavage site at the 
junction of the two polypeptides. Therefore, once the fusion protein was 
isolated (Figure 8, lane 2) it was treated with thrombin, cleaving the UL9- 
COOH/ gst fusion protein from the gst polypeptide (Figure 8, lane 3) . The UL9- 

65 COOH-gst fusion polypeptide was obtained at a protein purity of greater than 
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95% as determined using Coomassie staining. 

Other hybrid proteins can be utilized to prepare DNA-binding proteins of 
interest. For example, fusing a DNA-binding protein coding sequence in- frame 
with a sequence encoding the thrombin site and also in-frame with the p- 
5 galactoside coding sequence. Such hybrid proteins can be isolated by affinity 
or immunoaf finity columns (Maniatis, et al.; Pierce, Rockford IL) . Further, 
DNA-binding proteins can be isolated by affinity chromatography based on their 
ability to interact with their cognate DNA binding site. For example, the UL9 
DNA-binding site (SEQ ID NO: 601) can be covalently linked to a solid support 
10 {e.g., CnBr-activated Sepharose 4B beads, Pharmacia, Piscataway NJ) , extracts 
passed over the support, the support washed, and the DNA-binding then isolated 
from the support with a salt gradient (Kadonaga) . Alternatively, other 
expression systems in bacteria, yeast, insect cells or mammalian cells can be 
used to express adequate levels of a DNA-binding protein for use in this assay. 
15 The results presented below in regard to the DNA-binding ability of the 

truncated UL9 protein suggest that full length DNA-binding proteins are not 
required for the DNA:protein assay of the present invention: only a portion of 
the protein containing the cognate site recognition function may be required. 
The portion of a DNA-binding protein required for DNA-binding can be evaluated 
20 using a functional binding assay (Example 4A) . The rate of dissociation can be 
evaluated (Example 4B) and compared to that of the full length DNA-binding 
protein. However, any DNA-binding peptide, truncated or full length, may be 
used in the assay if it meets the criteria outlined in Section II.B.l, 
"Criteria for choosing an appropriate DNA-binding protein". This remains true 
25 whether or not the truncated form of the DNA-binding protein has the same 
affinity as the full length DNA-binding protein. 

d. Functional Binding and Rate of Dissociation . 
The full length UL9 and purified UL9-COOH proteins were tested for 

' 4 i 30 functional activity in "band shift" assays (see Example 4A) . The buffer 
^3 conditions were optimized for DNA: protein-binding (Example 4C) using the UL9- 

COOH polypeptide. These DNA-binding conditions also worked well for the full- 
length UL9 protein. Radiolabeled oligonucleotides (SEQ ID NO: 614) that 
contained the 11 bp UL9 DNA-binding recognition sequence (SEQ ID NO: 601) were 
4? 35 mixed with each UL9 protein in appropriate binding buffer. The reactions were 
incubated at room temperature for 10 minutes (binding occurs in less than 2 
minutes) and the products were separated electrophoretically on non-denaturing 
polyacrylamide gels (Example 4A) . 

The degree of DNA: protein-binding could be determined from the ratio of 
-labeled probe -present in DNA:protein. complexes .versus that present as free 
probe. This ratio was typically determined by optical scanning of autoradio- 
fj grams and comparison of band intensities. Other standard methods may be used 

l ^ as well for this determination, such as scintillation counting of excised 

C3 bands. The UL9-COOH polypeptide and the full length UL9 polypeptide, in their 

Li 4 5 respective buffer conditions, bound the target oligonucleotide equally well. 

The rate of dissociation was determined using competition assays. An 
excess of unlabelled oligonucleotide that contained the UL9 binding site was 
added to each reaction. This unlabelled oligonucleotide acts as a specific 
inhibitor, capturing the UL9 protein as it dissociates from the labelled oligo- 
50 nucleotide (Example 4B) . The dissociation rate, as determined by a band-shift 
assay, for both full length UL9 and UL9-COOH was approximately 4 hours at 4°C 
or approximately 10 minutes at room temperature. Neither non-specific 
oligonucleotides (a 10,000-fold excess) nor sheared herring sperm DNA (a 
100,000-fold excess) competed for binding with the oligonucleotide containing 
55 the UL9 binding site. 

e . oriS Flanking Sequence Variation . 

As mentioned above, one feature of a DNA: protein-binding 'system to be 
used in the assay of the present invention is that the DNA:protein interaction 
60 is not affected by the nucleotide sequence of the regions adjacent the DNA- 
binding site. The sensitivity of any DNA: protein-binding reaction to the 
composition of the flanking sequences can be evaluated by the functional 
binding assay and dissociation assay described above. 

To test the effect of flanking sequence variation on UL9 binding to the 
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oris SEQ ID NO: 601 sequences oligonucleotides were constructed with 20-30 
different sequences (i.e., the test sequences) flanking the 5' and 3' sides of 
the UL9 binding site. Further, oligonucleotides were constructed with point 
mutations at several positions within the UL9 binding site. Most point 
5 mutations within the binding site destroyed recognition. Several changes did 
not destroy recognition and these include variations at sites that differ 
between the UL9 binding sites (SEQ ID NO: 601 , SEQ ID NO: 602, SEQ ID NO: 615 and 
SEQ ID NO: 641): the second UL9 binding site (SEQ ID NO: 602) shows a ten-fold 
decrease in (JL9:DNA binding affinity (Elias, et al.) relative to the first (SEQ 
10 ID NO: 601). On the other hand, sequence variation at the test site (also 
called the test sequence), adjacent to the screening site (Figure 5, Example 
5), had virtually no effect on binding or the rate of dissociation. 

The results demonstrating that the nucleotide sequence in the test site, 
which flanks the screening site, has no effect on the kinetics of UL9 binding 
15 in any of the oligonucleotides tested is a striking result. This allows the 
direct comparison of the effect of a DNA-binding molecule on test* 
oligonucleotides that contain different test sequences. Since the only 
difference between test oligonucleotides is the difference in nucleotide se- 
quence at the test site(s), and since the nucleotide sequence at the test site 
20 has no effect on UL9 binding, any differential effect observed between the two 
test oligonucleotides in response to a DNA-binding molecule must be due solely 
to the differential interaction of the DNA-binding molecule with the test, se- 
quence (s). In this manner, the insensitivity of UL9 to the test sequences 
flanking the UL9 binding site greatly facilitates the interpretation of 
25 results. Each test oligonucleotide acts as a control sample for all other test 
oligonucleotides. This is particularly true when ordered sets of test se- 
quences are tested (e.g., testing all 256 four base pair sequences (Figure 13) 
.jft for binding to a single drug) . . 

ZZ Taken together the above experiments support that the UL9-COOH 

-% 30 polypeptide binds the SEQ ID NO: 601 sequence with (i) appropriate strength, 
42 (ii) an acceptable dissociation time, and (iii) indifference to the nucleotide 

jjj sequences flanking the screening site. These features suggested that the 

f," UL9/oriS system could provide a versatile assay for detection of small 

*'£ molecule/DNA-binding involving any number of specific nucleotide sequences. 

=!= 35 The above-described experiment can be used to screen other DNA: protein 

yl interactions to determine their usefulness in the present assay. 

j 1 f . Small Molecules as Sequence-Specific Competitive Inhibitors . 

H To test the utility of the present assay system several small molecules 

4 0 - that- have -sequence-binding- preferences - (-i.e.,- .a_ preference _ for. _AT_-rich_ versus _ 
GC-rich sequences) have been tested. 
\ ti Distamycin A binds relatively weakly to DNA (K A = 2 x 10 5 M" 1 ) with a 

"J? preference for non-alternating AT-rich sequences (Jain, et al.; Sobell; Sobell, 

et al.) - Actinomycin D binds DNA more strongly (K A = 7.6 x 10" 7 M" 1 ) than 
H 4 5 Distamycin A and has been reported to have a relatively strong preference for 
the dinucleotide sequence dGdC (Luck, et al.; Zimmer; Wartel) . Each of these 
molecules poses a stringent test for the assay. Distamycin A tests the 
sensitivity of the assay because of its relatively weak binding. Actinomycin D 
challenges the ability to utilize flanking sequences since the UL9 recognition 
50 sequence contains a dGdC dinucleotide: therefore, it might be anticipated that 
all of the oligonucleotides, regardless of the test sequence flanking the assay 
site, might be equally affected by actinomycin D. 

In addition, Doxorubicin, a known anti-cancer agent that binds DNA in a 
sequence-preferential manner (Chen, K-X, et al.), has been tested for 
55 preferential DNA sequence binding using the assay of the present invention. 

Actinomycin D, Distamycin A, and Doxorubicin have been tested for their 
ability to preferentially inhibit the binding of UL9 to oligonucleotides 
containing different sequences flanking the UL9 binding site (Example 6, Figure 
5) . Furthermore, distamycin A and actinomycin D have been screened against all 
60 possible 256 4 bp DNA sequences. Binding assays were performed as described in 
Example 5. These studies were completed under conditions in which UL9 is in 
excess of the DNA (i.e., most of the DNA is in DNA:protein complexes). 

In the preliminary studies, distamycin A was tested with 5 different test 
sequences flanking the UL9 screening sequence: SEQ ID NO: 605 to SEQ ID NO: 609 . 
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The results shown in Figure 10A demonstrate that Distamycin A preferentially 
disrupts binding to the test sequences UL9 polyT, UL9 polyA and, to a lesser 
extent, UL9 AT AT . Figure 10A also shows the concentration dependence of the 
inhibitory effect of distamycin A: at 1 HM distamycin A most of the 
5 DNA:protein complexes are intact (top band) with free probe appearing in the 
UL9 polyT and UL9 polyA lanes, and some free probe appearing in the UL9 ATAT 
lane; at 4 ^iM free probe can be seen in the UL9 polyT and UL9 polyA lanes; at 
16 W free probe can be seen in the UL9 polyT and UL9 polyA lanes; and at 40 HM 
the DNA: protein in the polyT, UL9 polyA and UL9 ATAT lanes are near completely 
10 disrupted while some DNA:protein complexes in the other lanes persist. These 
results were consistent with the reported preference of Distamycin A for non- 
alternating AT-rich sequences. 

Actinomycin D was tested with 8 different test sequences flanking the UL9 
screening sequence: SEQ ID NO: 605 to SEQ ID NO: 609, and SEQ ID NO: 611 to SEQ 
15 ID NO: 613. The results shown in Figure 10B demonstrate that actinomycin D 
preferentially disrupts the binding of UL9-COOH to the oligonucleotides UL9 
CCCG (SEQ ID NO: 605) and UL9 GGGC {SEQ ID NO: 606) . These oligonucleotides 
contain, respectively, three or five dGdC dinucleotides in addition to the dGdC 
dinucleotide within the UL9 recognition sequence. This result is consistent 
20 with the results described in the literature for Actinomycin D binding to the 
dinucleotide sequence dGdC. Apparently the presence of a potential preferred 
target site within the screening sequence {oris, SEQ ID NO: 601) , as mentioned 
above, does not interfere with the function of the assay. 

Doxorubicin was tested with 8 different test sequences flanking the UL9 
25 screening sequence: SEQ ID NO: 605 to SEQ ID NO: 609, and SEQ ID NO: 611 to SEQ 
_, s ID NO: 613. The results shown in Figure 10C demonstrate that Doxorubicin 

Q preferentially disrupts binding to oriEco3, the test sequence of which differs 

*Q from oriEco2 by only one base (compare SEQ ID NO:612 and SEQ ID NO:613). 

*£i Figure 10C also shows the concentration dependence of the inhibitory effect of 

30 Doxorubicin: at 15 JiM Doxorubicin, the UL9 binding to the screening sequence 
is strongly affected when oriEco3 is the test sequence, and more mildly 
affected when polyT, UL9 GGGC, or oriEco2 was the test sequence; and at 35 
|j J Doxorubicin most DNA: protein complexes are nearly completely disrupted, with 

: |; DL9 polyT and UL9ATAT showing some DNA still complexed with protein. Also, 

fh 35 effects similar to those observed at 15 were also observed using Doxorubicin 
**' at 150 nM, but at a later time point. 

5 The feasibility studies performed with the limited set of test sequences, 

H described above, provided evidence that the results of the assay are not incon- 

U sistent with the results reported- in -the literature . However, the screening of 

; s 4 0 all possible 256 four base-pair sequences, using the assay of the present 
f~ invention, provides a much more extensive overview of the sequence preferences 

iJj of distamycin A and actinomycin D. 

The actual ranking of values obtained from the assay, for any given test 
fT compound, can be variable. A number of sequences can be clustered having 

4 5 similar affinity: although absolute rank might not be determinable, relative 
ranks can be determined. 

The results obtained in the feasibility studies with both distamycin A 
and actinomycin D were corroborated by the results obtained in the screen of 
all 256 sequences. In other words, the rank of the oligonucleotides remained 
50 internally consistent in the larger screen. Further, the screens of distamycin 
A and actinomycin D both support the general hypotheses described in the 
literature: that is, distamycin A has a preference for binding AT-rich se- 
quences while actinomycin D has a preference for binding GC-rich sequences. 
However, both drug screens of all possible 4 bp sequences revealed additional 
55 characteristics that have not been described in the literature. 

Based on the data from 4 separate experiments (Examples 10 and 11; 
Figures 15, 16 and 17), consensus sequences can be derived for distamycin 
binding. One consensus sequence (Example 11) is relatively AT-rich, although 
the preference in the 4th base position is distinctly G or C. The other 
60 consensus sequence (Example 11) is relatively GC-rich, with some of the se- 
quences having a 75% GC-content. As noted above, the assay data is consistent 
with distamycin binding data shown in the literature. 
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The ability of the assay to distinguish sequence binding preference using 
weak DNA-binding molecules with relatively poor sequence-specificity (such as 
distamycin A) is a stringent test of the assay. Accordingly, the present assay 
seems well-suited for the identification of molecules having better sequence 
5 specificity and/or higher sequence binding affinity. Further, the results 
demonstrate sequence preferential binding with the known anti-cancer drug 
Doxorubicin. This result indicates the assay may be useful for screening 
mixtures for molecules displaying similar characteristics that could be 
subsequently tested for anti-cancer activities as well as sequence-specific 
10 binding. 

Other compounds that may be suitable for testing in the present 
DNA:protein system or for defining alternate DNA:protein systems include the 
following categories of DNA-binding molecules. 

A first category of DNA-binding molecules includes non-intercalating 
15 major and minor groove DNA-binding molecules. For example, two major classes 
of major groove binding molecules are DNA-binding proteins (or peptides) and 
nucleic acids (or nucleic acid analogs such as those with peptide or morpholino 
backbones) capable of forming triplex DNA. There are a number of non- 
intercalating minor groove DNA-binding molecules including, but not limited to 
20 the following: distamycin A, netropsin, mithramycin, chromomycin and oligomy- 
cin, which are used as antitumor agents and antibiotics; and synthetic 
antitumor agents such as berenil, phthalanilides, aromatic bisguanylhydrazones 
and bisquaternary ammonium heterocycles (for review, see Baguley, 1982} . Non- 
intercalating DNA-binding molecules vary greatly in structure: for example, 
25 the netropsin-distamycin series are oligopeptides compared to the 
diarylamidines berenil and stilbamidine. 
fj A second category of DNA-binding molecules includes intercalating DNA- 

S binding molecules. Intercalating agents are an entirely different class of 

DNA-bindinq molecules that have been identified as antitumor therapeutics and 
30 include molecules such as daunomycin (Chaires, et al.) and nogalomycin (Fox, et 
HM al. f 1988) (see Remers, 1984). 

ijj A third category of DNA-binding molecules includes molecules that have 

both groove-binding and intercalating properties. DNA-binding molecules that 
have both intercalating and minor groove binding properties include actinomycin 

=|~ 35 D (Goodisman, et al.) f echinomycin (Fox, et al. 1990), triostin A (Wang, et 

rp al.), and luzopeptin (Fox, 1988). In general, these molecules have one or two 

planar polycyclic moieties and one or two cyclic oligopeptides. Luzopeptins, 

; for instance, contain two substituted quinoline chromophores linked by a cyclic 

decadepsipeptide. They are closely related to the quinoxaline family, which 
-Lk -4-0- -includes- echinomycin— and— triostin. A,_ although _they_ luzopeptins .have_ten amino. 

acids in the cyclic peptide, while the quinoxaline family members have eight 

; : amino acids. 

ly In addition to the major classes of DNA-binding molecules, there are also 

Cl some small inorganic molecules, such as cobalt hexamine, which is known to 

'{Jl 4 5 induce Z-DNA formation in regions that contain repetitive GC . sequences 
r (Gessner, et al.). Another example is cisplatin, 

cisdiamminedichloroplatinum(II) , which is a widely used anticancer therapeutic. 

Cisplatin forms a covalent intrastrand crosslink between the N7 atoms of 
adjacent guanosines (Rice, et al.). 
50 Furthermore, there are a few molecules, such as calichemicin, that have 

unusual biochemical structures that do not fall in any of the major categories. 

Calichemicin is an antitumor antibiotic that cleaves DNA and is thought to 
recognize DNA sequences through carbohydrate moieties (Hawley, et al.) . 
Several DNA-binding molecules, such as daunomycin, A447C, and cosmomycin B have 
55 sugar group, which may play a role in the recognition process. 

Limited sequence preferences for some of the above drugs have been 
suggested: for example, echinomycin is thought to preferentially bind to the 
sequence (A/T) CGT (Fox, et al.). However, the absolute sequence preferences of 
the known DNA-binding drugs have never been demonstrated. Despite the large 
60 number of publications in this field, prior to the development of the assay 
described herein, no methods were available for determining sequence 
preferences among all possible binding sequences. 
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g. Theoretical Considerations on the Concentration of Assay 

Components . 

There are two major components in the assay, the test oligonucleotide 
{i.e., the test sequence) and the DNA-binding domain of UL9, which is described 
5 below. A number of theoretical considerations have been employed in 

establishing the assay system. In one embodiment of the invention, the assay 
is used as a mass-screening assay: in this embodiment the smallest volumes and 
concentrations possible were desirable. Each assay typically uses about 0.1- 
0.5 ng DNA in a 15-20 |il reaction volume (approximately 0.3-1.5 nM) . The 
10 protein concentration is in excess and can be varied to increase or decrease 
the sensitivity of the assay. In the simplest scenario (stearic hindrance), 
where the small molecule is acting as a competitive inhibitor and the ratio of 
DNA: protein and DNA-binding test molecule: DNA is 1:1, the system kinetics can 
be described by the following equations: 

15 

D + P ~ D:P, where k fp /k bp = K eq , p = [D:P]/[D][P] 

and 

20 D + X - D:X, where k fx /k bx = K eq , x = [D:X]/[D][X] 

D =* DNA, P = protein, X = DNA-binding molecule, k fp and k fx are the rates of the 
forward reaction for the DNA: protein interaction and DNA: drug interaction, 
respectively, and k bp and k bx are the rates of the backwards reactions for the 
25 respective interactions. Brackets, [], indicate molar concentration of the 
components . 

In the assay, both the protein, P, and the DNA-binding molecule or drug, 
X, are competing for the DNA. If stearic hindrance is the mechanism of 
jjs inhibition, the assumption can be made that the two molecules are competing for 

30 the same site. When the concentration of DNA equals the concentration of the 
DNA: drug or DNA:protein complex, the equilibrium binding constant, K eq , is equal 
hj to the reciprocal of the protein concentration (1/[P]). When all three 

\j\ components are mixed together, the relationship between the drug and the 

\Z protein can be described as: 

X 35 

CD K eq , p = z(K eq ,J 

where "z" defines the difference in affinity for the DNA between P and X. For 
f™ _example, if z =4, then the affinity of the drug is 4-fold lower than the 

"I s - "40 af finity" "of ' the protein for' the ~ DNA ^molecule - The -concentration of X r 
therefore, must be 4 -fold greater than the concentration of P, to compete 
equally for the DNA molecule. Thus, the equilibrium affinity constant of UL9 
will define the minimum level of detection with respect to the concentration 
M and/or affinity of the drug. Low affinity DNA-binding molecules will be 

4 5 detected only at high concentrations; likewise, high affinity molecules can be 
detected at relatively low concentrations. With certain test sequences, 
complete inhibition of UL9 binding at markedly lower concentrations than 
indicated by these analyses have been observed, probably indicating that 
certain sites among those chosen for feasibility studies have affinities higher 
50 than previously published. Note that relatively high concentrations of known 
drugs can be utilized for testing sequence specificity. In addition, the 
binding constant of UL9 can be readily lowered by altering the pH or salt 
concentration in the assay if it ever becomes desirable to screen for molecules 
that are found at low concentration (e.g., in a fermentation broth or extract). 
55 The system kinetic analysis becomes more complex if more than one protein 

or drug molecule is bound by each DNA molecule. As an example, if UL9 binds as 
a dimer, . 
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D + 2P DP 2 

then the affinity constant becomes dependent on the square of the protein 
concentration: 



K = [DP 2 ]/(D] [P] 2 
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The same reasoning holds true for the DNA-binding test molecule, X; if, 

D + 2X ** DX 2 

then the affinity constant becomes dependent on the square of the protein 
concentration: 

K = [DX 2 ]/[D] [X] 2 

Similarly, if the molar ratio of DNA to DNA-binding test molecule was 1:3, the 
affinity constant would be dependent on the cube of the drug concentration. 

Experimentally, the ratio of molar components can be determined. Given 
the chemical equation: 



xD + yP D x P y , 

the affinity constant may be described as 

20 K = [D,P y ]/[D] x [P] y 

where [] indicates concentration, D - DNA, P = protein, x = number of DNA mole- 
cules per DNA: protein complex, and y = number of protein molecules per 
DNA:protein complex. By determining the ratio of DNA: protein complex to free 
25 DNA, one can solve for x and y: 

if Xtotal = Xfree + Xbound' 



if a - the fraction of DNA that is free, then the fraction of DNA that is 
\t 30 bound can be described as 1-a; . and if x bound :x free (the ratio of DNA:protein 

complex to free DNA) is known for more than one DNA concentration. This is 
Ul because the affinity constant should not vary at different DNA concentrations. 

\,[ Therefore, 

'^Z K D:P , (D i] = K D:Pf ( D2 ] - 

U = 

Substituting the right side of the equation above, 



[Dl x P y ]/[Dl] x [P] y = [D2 x P y ]/[D2] x [P] y . 



Because the concentration of components in the assay can be varied and 
are known, the molar ratio of the components can be ■ determined. Therefore, 
[Dl x P y ] and [D2 x P y ] can be described as (1-a].) [xj and (l-a 2 )[x 2 ], respectively, 
and [Dl] and [D2] can be described as fai) [xi\ and (a 2 )[x 2 ], respectively. [PJ 

45 remains constant and is described as (y) - (y/x) {1-a} (x) , where y is the total 
protein concentration and (y/x) (1-a) (x) is the protein complexed with DNA. 

The system kinetic analyses become more complex if the inhibition is 
allosteric (non-competitive inhibition) rather than competition by stearic 
hindrance. Nonetheless, the probability that the relative effect of an 

50 inhibitor on different test sequences is due to its relative and differential 
affinity to the different test sequences is fairly high. This is particularly 
true in the assays in which all sequences within an ordered set (e.g., possible 
sequences of a given length or all possible variations of a certain base 
composition and defined length) are tested. In short, if the effect of 

55 inhibition in the assay is particularly strong for a single sequence, then it 
is likely that the inhibitor binds that particular sequence with higher 
affinity than any of the other sequences. Furthermore, while it may be 
difficult to determine the absolute affinity of the inhibitor, 'the relative 
affinities have a high probability of being reasonably accurate. This 

60 information will be most useful in facilitating, for instance, the refinement 
of molecular modeling systems. 
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h. The Use of the Assay under Conditions of Very High Protein 
Concentration . 

When the screening protein is added to the assay system at very high 
concentrations {i.e., high enough to force binding to non-specific sites — the 
5 protein binds to non-specific sites on the oligonucleotide as well as the 
screening sequence} . This has been demonstrated using band shift gels: when 
serial dilutions are made of the protein and mixed with a fixed concentration 
of oligonucleotide, no binding (as seen by a band shift) is observed at very 
low dilutions (e.g., 1:100,000), a single band shift is observed at moderate 
10 dilutions (e.g., 1:100) and a smear, migrating higher than the single band 
observed at moderate dilutions, is observed at high concentrations of protein 
(e.g., 1:10). The observation of a smear is indicative of a mixed population 
of complexes, all of which presumably have the screening protein binding to the 
screening sequence with high affinity, but in addition have a larger number of 
15 proteins bound with markedly lower affinity to other sites. 

Some of the low affinity binding proteins are likely bound to the test 
sequence. For example, when using the UL9-based system, the low affinity 
binding proteins are likely UL9 or less likely glutathione-S-transferase : 
these are the only proteins in the assay mixture. These proteins are 
20 significantly more sensitive to interference by a molecule binding to the test 
sequence for two reasons. First, the interference is likely to be by direct 
stearic hinderance and does not rely on induced conformational changes in - the 
DNA; secondly, the protein is a low affinity binding protein because the test 
site is not a cognate-binding sequence. In the case of UL9, the difference in 
25 affinity between the low affinity binding and the high affinity binding appears 
to be at least two orders of magnitude. 
fj\ The filter binding assays capture more DNA: protein complexes when more 

'% protein is bound ' to the DNA. The relative results are accurate, but under 

moderate protein concentrations, not all of the bound DNA (as demonstrated by 
'42 30 band shift assays) will bind to the filter unless there is more than one 
kO DNA:protein complex per oligonucleotide (e.g., in the case of UL9, more than 

%?• one UL9 : DNA complex) . This makes the assay exquisitely sensitive under 

conditions of high protein concentration. For instance, when actinomycin binds 
^ DNA at a test site under conditions where there is one DNA: UL9 complex per oli- 

■1Z 35 gonucleotide, a preference for binding GC-rich oligonucleotides has been 
m observed; under conditions of high protein concentration, where more than one 

DNA: UL9 complex is found per oligonucleotide, this binding preference is even 
5 more apparent. These results suggest that the effect of actinomycin - D on a 

M_ test site that is weakly bound by protein may be more readily detected than the 

yf 40" "effect ~o~f"act"inomycirr- D on -the -adjacent- -screening - sequence. Therefore, 

) . employing high protein concentrations may increase the sensitivity of the 

f tt assay. 
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% 1 1 . Amplification-Based Selection Technique to Determine the Sequence Preferences 
^45 of DNA-Binding Molecules . 

A. Design of Test Oligonucleotides . 

The above-described assay can be coupled to amplification methods (in one 
embodiment, polymerase chain reaction (Mullis, et al.; Mullis; Innis, et al.)} 
to achieve identification of the sequences to which binding of a test molecule 
50 is most preferred. 

In this embodiment of the present invention, a double stranded test oli- 
gonucleotide is synthesized that contains the following elements: 

(i) the binding site for a DNA-binding protein (for example, UL9) , i.e., 
the screening site, 

55 (ii) adjacent the screening site, a test site composed of more than two 

base pairs and preferably less than 20 base pairs (most preferably 4-12 bases), 
and 

(iii) means to isolate selected sequences for amplification, such as a 
sufficient number of bases flanking the test site sequences to function as 
60 priming sites for polymerase chain reaction amplification or restriction sites 
useful to facilitate cloning. 

Priming sites can also be used as primer binding sites for dideoxy 
sequencing reactions and may contain restriction endonuclease cleavage sites to 
facilitate cloning manipulations. 
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The double-stranded test oligonucleotide can be generated by second- 
strand synthesis using a primer complementary to the priming site at the 3* end 
of the top-strand of the test oligonucleotide. Alternatively, both strands can 
be generated by other means, such as chemical synthesis, and the double- 
5 stranded test oligonucleotides can be generated by hybridization of the 
strands . 

An example of one such a test oligonucleotide is shown in Figure 29A (SEQ 
ID NO.-630, SEQ ID NO;631 and SEQ ID NO:632). A specific example of a test oli- 
gonucleotide is shown in Figure 29B (SEQ ID NO: 633), where X=4 . All possible 
10 256 four base pair sequences are represented at equimolar levels within the 
pool of oligonucleotides generated by this sequence design. 

Another example of such a test oligonucleotide sequence is shown in 
Figure 29C (SEQ ID NO: 634 ) , for an 8 base pair test sequence. In this pool of 
mixed sequences, all possible 8 base pair sequences (4 8 = 65,536) are present 
15 in equimolar amounts. 

A second set of test oligonucleotides may be constructed in which the 
test site is placed on the other side of the DNA-binding protein recognition 
site (e.g., Figure 29D, SEQ ID NO:635). 

For any single-stranded test oligonucleotide pool, the single-stranded 
20 molecules are annealed to a primer and the bottom strands are enzymatically 
synthesized by primer extension reactions. One advantage of using the 
assay/amplification PCR-cycling embodiment of the present invention is that it 
is convenient to work with larger test sequences in this embodiment. This 
protocol is geared to determining the highest affinity binding sequences and is 
25 not capable of determining the rank of all test sequences nor of identifying 
low affinity binding sites: such ranking can be determined by screening 
f*3 individual sequences as described above. 
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% B . Applying the Assay to the Mixed Pools of Test Oligonucleotides . 

4l 30 Using double-stranded test oligonucleotides, such as those just 

described, the basic assay is performed essentially as described above (Section 
I}: typically without the use of radioactive detection systems. As previously 
discussed, a number of DNA:protein interactions may be used in this assay 
UJ system. One example of such a system is the interaction of the DNA-binding 

^j:: 35 domain of UL9 (or UL9-COOH) with its cognate recognition sequence. 
^ In this embodiment of the present invention, UL9-COOH is added to the 

%Ai test oligonucleotide pool (for example, 256 four base pair sequences are 

s represented at equimolar levels within the pool of oligonucleotides described 

above) in UL9 binding buffer. DNA-binding molecules are tested for the ability 
Ll ~4~0 to^diff erenti'ally" di srupt— the "binding of— the-UL9-DNA: protein -complex-by__binding._ 
f" to the test sequence. After the addition of the test molecule or test mixture 

(e.g., a fermentation broth or fungal extract), the assay mixture is incubated 
Lj for a desired time, then passed through a nitrocellulose filter. DNA: protein 

E**j (such as DNA: UL9) complexes are captured on the filter. DNA that is not bound 

?: 45 by protein passes through the filter (i.e., the filtrate) (step 1). The volume 
^~ of the assay is adjusted to accommodate the amount required for the filtering 

process: that is, taking into consideration the losses incurred during the 
filtering process. 

50 C. Amplification . 

In one embodiment, the DNA present in the filtrate is amplified using the 
polymerase chain reaction (PCR) technology (Mullis; Mullis, et al.; Perkin 
Elrner-Cetus) . An aliquot of the resulting PCR-amplif ied material is cycled 
through the DNA:protein binding assay again (step 2), then PCR-amplif ied again 

55 (step 3). Steps 1-3 are repeated several times using each subsequent filtrate. 
After each PCR amplification, part of the PCR-amplif ied material is retained 
for sequencing analysis. The result of the repeated cyclings through the 
assay/amplification process is that the test oligonucleotide sequences that are 
amplified contain test sequences that are preferred binding sites for the test 

60 molecules. Through subsequent rounds of assay/amplification, these oligonucle- 
otides are amplified to represent a larger and larger percent of the total 
population of amplified DNA molecules. 

In addition to PCR, the DNA present in the filtrate can be amplified by 
other methods as well. For example, the DNA present in the filtrate can be 

65 cloned into a selected vector (such as, phage vectors, e.g., lambda-based, or 
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standard cloning vectors, e.g., pBR322- or pUC-based) . The cloned sequences 
are then transformed into an appropriate host organism in which the selected 
vector can replicate (for example, bacteria or yeast) . The transformed host 
organism is cultured with concurrent amplification of the vectors containing 
5 the cloned sequences. The vectors are then isolated by standard procedures 
(Maniatis, et al.; Sambrook, et al.; Ausubel, et al . ) - Typically, the cloned 
sequences, originally obtained from the DNA filtrate, are obtained from the 
vector by restriction endonuclease digestion and size~f ractionation (for 
example, electrophoretic separation of the digestion products followed by 

10 electroelution of the cloned sequences of interest) (Ausubel, et al.) . These 
isolated amplified test oligonucleotide sequences can then be recycled through 
subsequent rounds of assay/amplification as described above. 

In another embodiment, the oligonucleotide sequences present in the 
original DNA filtrate can be isolated, sequenced and amplified by in vitro 

15 synthesis of copies of the oligonucleotides. 

D. Sequencing of Amplified DNA . 

Samples from each cycle are sequenced using, for example, radio-labeled 
primers and dideoxy sequencing methodologies (Sanger) or the chemical 
20 methodologies outlined by Maxam and Gilbert. If the amplified sequences are 
not sufficiently resolved to obtain a unambiguous sequence information, then 
the DNA is further purified and sequenced. For example, the DNA is cleaved at 
the restriction endonuclease sites within the primer sequences and subcloned 
into a convenient sequencing vector, such as "BLUESCRIPT" (Stratagene, La 
25 Jolla, CA} . The sequencing vectors carrying the amplified inserts are 
transformed into bacteria. The resulting cloned vectors are isolated and 
ri sequenced (in the case of "BLUESCRIPT," using the commercially available 

^» primers and protocols) . 

30 IV. Modifications of Test Oligonucleotides and other Useful DNA: Protein 
Interactions 

i~i One class of DNA: protein interactions that may be useful in the assay of 

the present invention is the restriction endonuclease : restriction site class of 
DNA:protein interactions. In the absence of divalent cations, restriction 
35 endonucleases bind DNA but have no enzymatic activity (cleavage of DNA does not 
^ take place without divalent cations) . This allows the assay of the present 

'' invention to be performed using a restriction endonuclease with its cognate 

5 binding site as the screening sequence. The use of the restriction 

j=a endonuclease : restriction site interaction as the basis of the present assay is 

fai 30" described in greater" detail" in' Section- VI; B. 4 (c) . - - - - — - - ■- 

; 5 The test oligonucleotides of the present invention can be modified to 

^ K contain two different DNArprotein screening systems, i.e., two different 

IJJ screening sequences with their respective cognate binding proteins. In the 

assay described above, the UL9 screening sequence lies on one side of and 
1*7 45 immediately adjacent to the test sequence. A second screening sequence, such 
?ss as, a restriction endonuclease recognition sequence {restriction site), can be 

introduced immediately adjacent to the other side of the test sequence. 

Several restriction enzymes may recognize the same restriction site. 
These enzymes are not identical, however, in that the cleavage sites may be at 
50 the 5' end, the center, or the 3* end of the recognition sequence. For this 
reason, a restriction site that is recognized by more than one restriction 
enzyme may be incorporated adjacent to the test site. This allows a single 
pool of test oligonucleotides to be used in assays employing three different 
DNArprotein interactions: the screening sequence has the same sequence but the 
55 restriction endonuclease used in the assay system is different in each case. 
Using this method to design test oligonucleotides, the UL9 screening sequence 
may be placed on one side of a test sequence and a restriction site screening 
sequence (having three cognate binding proteins) may be placed on the other 
side of the test sequence. Such a test oligonucleotide construction allows 4 
60 different DNA:protein assay interaction systems to be employed with a single 
pool of test sequences. 

One example of test oligonucleotides using several different DNA: protein 
interaction systems are shown in Figure 30. The top strands of the pool of 
test oligonucleotides shown in Figure 30 have 6 base pair test sequences 
65 (NNNNNN) and represent synthetic pools of all possible 4096 test sequences. 
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The remainder of the nucleotide sequence is fixed. The test oligonucleotides 
contain the UL9 recognition sequence, 5 ' -CGTTCGCACTT-3 ' (underlined) on one 
side of the test sequence and a restriction endonuclease binding sequence, 5'- 
GGTACC-3' (bold), on the other side of the test site. The restriction 
5 endonuclease recognition sequence is recognized by the three different 
restriction endonucleases Asp718, Rsal and KpnI. In Figure 30 the UL9 binding 
site {screening sequence) is located 3' of the test sequence: the UL9 binding 
site {screening sequence) can also be located 5' of the test sequence. 

The shorter sequences shown above the 5' and 3 f ends of the test oligonu- 
10 cleotides are primer sequences useful for sequencing and PCR amplification. 
The primer sequences contain commonly used restriction endonuclease sites for 
the purpose of subcloning into sequencing vectors. 

Performing the assay with two or more different protein/screening se- 
quence systems allows the confirmation of putative high affinity binding 
15 between a test compound and specific test sequences. 

Alternatively, since there is no assurance that a test molecule that 
binds the test sequence will have the same effect on protein binding at both 
adjacent flanking sequences, simultaneous use of both test systems may reduce 
the number of false negatives detected in an assay. For example, a test 
20 molecule that does not affect the binding of protein at one screening site but 
may effect the binding of a different protein at the other screening site. 

V. Capture/Detection Systems . 

As an alternative to the above described band shift gels and filter 
25 binding assays, the measurement of inhibitors can be monitored by measuring 
either the level of unbound DNA in the presence of test molecules or mixtures 
or the level of DNA: protein complex remaining in the presence of test molecules 
or mixtures. Measurements may be made either at equilibrium or, in a kinetic 
assay, prior to the time at which equilibrium is reached. The type of 
^ 30 measurement is likely to be dictated by practical factors, such as the length 
yi of time to equilibrium, which will be determined by both the kinetics of the 

=7i DNA:protein interaction as well as the kinetics of the DNA:drug interaction. 

The results {i.e., the detection of DNA-binding molecules and/or the 
determination of their sequence preferences) should not vary with the type of 
sja 35 measurement taken (kinetic or equilibrium) . 

fj% Figure 2 illustrates an assay for detecting inhibitory molecules based on 

''" ' their ability to preferentially hinder the binding of a DNA-binding protein. 

" In the presence of an inhibitory molecule (X) the equilibrium between the DNA- 

H_ binding protein and its binding site (screening sequence) is disrupted. The 

\ 9 k 40 DNA^-biiKiihg" prdtein~{0)^ rs~displaced- from -DNA- -{-/-)- -in -the- presence— of -inhibitor^ 
J ; (X), the DNA free of protein or, alternatively, the DNA: protein complexes, can 

T" then be captured and detected. 

UJ For maximum sensitivity, unbound DNA and DNA: protein complexes should be 

\J% sequestered from each other in an efficient and rapid manner. The method of 

Tl 45 DNA capture should allow for the rapid removal' of the unbound DNA from he 
% ~ protein-rich mixture containing the DNA:protein complexes. 

Even if the test molecules are specific in their interaction with DNA 
they may have relatively low affinity and they may also be weak binders of non- 
specific DNA or have non-specific interactions with DNA at low concentrations. 
50 In either case, their binding to DNA may only be transient, much like the 
transient binding of the protein in solution. Accordingly, one feature of the 
assay is to take a molecular snapshot of the equilibrium state of a solution 
comprised of the test oligonucleotide DNA, the protein, and the inhibitory test 
molecule. In the presence of an inhibitor, the amount of DNA that is not bound 
55 to protein will be greater than in the absence of an inhibitor. Likewise, in 
the presence of an inhibitor, the amount of DNA that is bound to protein will 
be lesser than in the absence of an inhibitor. 

Any method used to separate the DNA: protein complexes from unbound DNA, 
should be rapid, because when the capture system is applied to the solution (if 
60 the capture system is irreversible), the ratio of unbound DNA to DNA:protein 
complex will change at a predetermined rate, based purely on the off-rate of 
the DNA:protein complex. This step, therefore, determines the limits of 
background. Unlike the protein and inhibitor, the capture system should bind 
rapidly and tightly to the DNA or DNA: protein complex. The longer the capture 
65 system is left in contact with the entire mixture of unbound DNA and 
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DNA:protein complexes in solution, the higher the background, regardless of the 
presence or absence of inhibitor. 

Two exemplary capture systems are described below for use in the assay of 
the present invention. One capture system has been devised to capture unbound 
5 DNA (Section V.A) . The other has been devised to capture DNA:protein complexes 
(Section V.B) . Both systems are amenable to high throughput screening assays. 
The same detection methods (Section V.C) can be applied to molecules captured 
using either capture system. 

10 A. Capture of Unbound DNA . 

One capture system that has been developed in the course of experiments 
performed in support of the present invention utilizes a streptavidin/biotin 
interaction for the rapid capture of unbound DNA from the protein-rich mixture, 
which includes unbound DNA, DNA:protein complexes, excess protein and the test 
15 molecules or test mixtures. Streptavidin binds with extremely high affinity 
to biotin (Kd = 1CT 15 M) (Chaiet, et al.; Green). Accordingly, two advantages of 
the streptavidin/biotin system are that binding between the two molecules can 
be rapid and the interaction is the strongest known non-covalent interaction. 

In this detection system a biotin molecule is covalently attached in the 
20 oligonucleotide screening sequence (i.e., the DNA-binding protein's binding 
site) . This attachment is accomplished in such a manner that the binding of 
the DNA-binding protein to the DNA is not destroyed. Further, when the protein 
is bound to the biotinylated sequence, the protein prevents the binding of 
streptavidin to the biotin. In other words, the DNA-binding protein is able to 
25 protect the biotin from being recognized by the streptavidin. This DNA: protein 
interaction is illustrated in Figure 3. 
r~\ The capture system is described herein for use with the UL9/oriS system 

described above. The following general testing principles can, however, be 
^ applied to analysis of other DNA: protein interactions. The usefulness of this 

30 system depends on the biophysical characteristics of the particular DNA: protein 
yj interaction. 

'f* 1 . Modification of the Protein Recognition Sequence with Biotin . 

The recognition sequence for the binding of the UL9 (Koff, et al.) 
Jp 35 protein is underlined in Figure 4. Oligonucleotides were synthesized that 
fi== contain the UL9 binding site and site-specifically biotinylated a number of 

locations throughout the binding sequence (SEQ ID NO:614; Example 1, Figure 4). 
f These biotinylated oligonucleotides were then used in band shift assays to 

determine the ability of the UL9 protein to bind to the oligonucleotide. These 
L;l ~4"0 " experiments "using -the- biotinylated probe -and -a- -non-biotinylated ._pr.obe. as a 
I ; control demonstrate that the presence of a biotin at the #8-T (biotinylated 

\~\ deoxyuridine) position of the bottom strand meets the requirements listed 

Uj above: the presence of a biotin moiety at the #8 position of the bottom- strand 

does not markedly affect the specificity of UL9 for the recognition site. 
4 5 Further, in the presence of bound UL9, streptavidin does not recognize the 
presence of the biotin moiety in the oligonucleotide. Biotinylation at other A 
or T positions did not have the two necessary characteristics (i.e., (JL9 
binding and protection from streptavidin) : biotinylation at the adenosine in 
position #8, of the top strand, prevented the binding of UL9; biotinylation of 
50 either adenosines or thymidines (top or bottom strand) at positions #3, #4, 
#10, or #11 all allowed binding of UL9, but in each case, streptavidin also was 
able to recognize the presence of the biotin moiety and thereby bind the oligo- 
nucleotide in the presence of UL9. 

The above result (the ability of UL9 to bind to an oligonucleotide 
55 containing a biotin within the recognition sequence and to protect the biotin 
from streptavidin) was unexpected in that methylation interference data (Koff, 
et al.) suggest that methylation of the deoxyguanosine residues at positions #7 
and #9 of the recognition sequence (on either side of the * biotinylated 
deoxyuridine) blocks UL9 binding. In these methylation interference 

60 experiments, guanosines are methylated by dimethyl sulfate at the N 7 position, 
which corresponds structurally to the 5-position of the pyrimidine ring at 
which the deoxyuridine is biotinylated. These moieties all protrude into the 
major groove of the DNA. The methylation interference data suggest that the #7 
and #9 position deoxyguanosines are contact points for UL9, it was therefore 
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unexpected that the presence of a biotin moiety between them would not 
interfere with binding. ■ 

The binding of the full length protein was relatively unaffected by the 
presence of a biotin at position #8 within the UL9 binding site. The rate of 
5 dissociation was similar for full length UL9 with both biotinylated and un- 
biotinylated oligonucleotides. However, the rate of dissociation of the 
truncated UL9-C00H polypeptide was faster with the biotinylated oligonucleo- 
tides than with non-biotinylated oligonucleotides {for non-biotinylated 
oligonucleotides the rate comparable to that of the full length protein with 
10 either DNA) . 

The binding conditions were optimized for UL9-COOH so that the half-life 
of the truncated UL9 from the biotinylated oligonucleotide was 5-10 minutes 
(optimized conditions are given in Example 4), a rate compatible with a mass 
screening assay. The use of multi-well plates to conduct the DNA:protein assay 
15 of the present invention is one approach to mass screening. 

2. Capture of Site-Specific Biotinylated Oligonucleotides . 
The streptavidin: biotin interaction can be employed in several 
different ways to remove unbound DNA from the solution containing the DNA, 
20 protein, and test molecule or mixture. Magnetic polystyrene or agarose beads, 
to which streptavidin is covalently attached or attached through a covalently 
attached biotin, can be exposed to the solution for a brief period, then 
removed by use, respectively, of magnets or a filter mesh. Magnetic 
streptavidinated beads are currently the method of choice. Streptavidin has 
25 been used in many of these experiments, but avidin is equally useful. 

An example of a second method for the removal of unbound DNA is to attach 
U streptavidin to a filter by first linking biotin to the filter, binding 

4~- streptavidin, then blocking nonspecific protein binding sites on the filter 

■M with a nonspecific protein such as albumin. The mixture is then passed through 

30 the filter, unbound DNA is captured and the bound DNA passes through the 
W filter. This method can give high background due to partial retention of the 

yj DNA: protein complex on the filter. 

|jj One convenient method to sequester captured DNA is the use of 

*~ streptavidin-conjugated superparamagnetic polystyrene beads as described in 

X 35 Example 7. These beads are added to the assay mixture to capture the unbound 
\P- DNA. After capture of DNA, the beads can be retrieved by placing the reaction 

u tubes in a magnetic rack, which sequesters the beads on the reaction chamber 

7 L wall while the assay mixture is removed and the beads are washed. The captured 

f K DNA is then detected using one of several DNA detection systems, as described 

4 0" " below r -------- - - - - - - 

Alternatively, avidin-coated agarose beads can be used. Biotinylated 
= agarose beads (immobilized D-biotin, Pierce) are bound to" avidin. Avidin, like 

streptavidin, has four binding sites for biotin. One of these binding sites is 
O used to bind the avidin to the biotin that is coupled to the agarose beads via 

M 45 a 16 atom spacer arm: the other biotin binding sites remain available. The 
beads are mixed with binding mixtures to capture biotinylated DNA (Example 7) . 

Alternative methods (Harlow, et al.) to the bead capture methods just 
described include the following streptavidinated or avidinated supports: low- 
protein binding filters, or 96-well plates. 

50 

B, Capture of DNA: Protein Complexes . 

The amount of DNA: protein complex remaining in the assay mixture in the 
presence of an inhibitory molecule can also be determined as a measure of the 
relative effect of the inhibitory molecule. A net decrease in the amount of 

55 DNA:protein complex in response to a test molecule is an indication of the 
presence of an inhibitor. DNA molecules that are bound to protein can be 
captured on nitrocellulose filters. Under low salt conditions, DNA that is not 
bound to protein freely passes through the filter. Thus, by passing the assay 
mixture rapidly through a nitrocellulose filter, the DNA:protein complexes and 

60 unbound DNA molecules can be rapidly separated. This has been accomplished on 
nitrocellulose discs using a vacuum filter apparatus or on slot blot or dot 
blot apparatuses (all of which are available from Schleicher and Schuell, 
Keene, NH) . The assay mixture is applied to and rapidly passes through the 
wetted nitrocellulose under vacuum conditions. Any apparatus employing 

65 nitrocellulose filters or other filters capable of retaining protein while 
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allowing free DNA to pass through the filter would be suitable for this system. 
C. Detection Systems . 

For . either of the above capture methods, the amount of DNA that has been 
5 captured is quantitated. The method of quantitation depends on how the DNA has 
been prepared. If the DNA is radioactively labelled, beads can be counted in a 
scintillation counter, or autoradiographs can be taken of dried gels or 
nitrocellulose filters. The amount of DNA has been quantitated in the latter 
case by a densitometer (Molecular Dynamics, Sunnyvale, CA) ; alternatively, 
10 filters or gels containing radiolabeled samples can be quantitated using a 
phosphoimager (Molecular Dynamics) . Further, the captured DNA may be detected 
using a chemiluminescent or colorimetric detection system. 

Radiolabelling and chemiluminescence (i) are very sensitive, allowing the 
detection of sub-f emtomole quantities of oligonucleotide, and (ii) use well- 
15 established techniques. In the case of chemiluminescent detection, protocols 
have been devised to accommodate the requirements of a mass-screening assay. 
Non-isotopic DNA detection techniques have principally incorporated alkaline 
phosphatase as the detectable label given the ability of the enzyme to give a 
high turnover of substrate to product and the availability of substrates that 
20 yield chemiluminescent or colored products. 

1 . Radioactive Labeling . 
Many of the experiments described above for UL9 DNA:protein-binding 
studies have made use of radio-labelled oligonucleotides. The techniques 
25 involved in radiolabelling of oligonucleotides have been discussed above. A 
specific activity of 10 8 -10 9 dpm per Mg DNA is routinely achieved using standard 
methods (e.g., end-labeling the oligonucleotide with adenosine y-[ 32 P]-5' 
triphosphate and T4 polynucleotide kinase) . This level of specific activity 
yjj allows small amounts of DNA to be measured either by autoradiography of gels or 

30 filters exposed to film or by direct counting of samples in scintillation 
J'l fluid. 

y J 2 . Chemiluminescent Detection . 

„j~ For chemiluminescent detection, digoxigenin-labelled oligonucleotides 

2Z 35 (Example 1) can be detected using the chemiluminescent detection system 
U ! - "SOUTHERN LIGHTS," developed by Tropix, Inc. (Bedord, MA). The detection 

system is diagrammed in Figures 11A and 11B. The technique can be applied to 
detect DNA that has been captured on either beads, filters, or in solution. 
L_ _ _ Alkaline phosphatase is coupled to the captured DNA without interfering 

40 with the capture system". To do this several" methods, derived from commonly 
H used ELISA (Harlow, et al . ; Pierce, Rockford IL) techniques, can be employed. 

Ijj For example, an antigenic moiety is incorporated into the DNA at sites that 

'?k will not interfere with (i) the DNA: protein interaction, (ii) the DNA: drug 

interaction, or (iii) the capture system. In the UL9 DNA:protein/biotin system 
r h 4 5 the DNA has been end-labelled with digoxigenin-ll-dUTP (dig-dUTP) and terminal 
transferase (Example 1, Figure 4) . After the DNA was captured and removed from 
the DNA: protein mixture, an anti-digoxigenin-alkaline phosphatase conjugated 
antibody was then reacted (Boehringer Mannheim, Indianapolis IN) with the 
digoxigenin-containing oligonucleotide. The antigenic digoxigenin moiety was 
50 recognized by the antibody-enzyme conjugate. The presence of dig-dUTP altered 
neither the ability of UL9-C0OH protein to bind the oris (SEQ ID NO: 601)- 
containing DNA nor the ability of streptavidin to bind the incorporated biotin. 

Captured DNA was detected using the alkaline phosphatase-conj ugated 
antibodies to digoxigenin as follows. One chemiluminescent substrate for 
55 alkaline phosphatase is 3- (2 ' -spiroadamantane) -4-methoxy-4- ( 3"-phosphoryloxy ) 
phenyl-1, 2-dioxetane disodium salt (AMPPD) (Example 7). Dephosphorylation of 
AMPPD results in an unstable compound, which decomposes, releasing a prolonged, 
steady emission of light at 477 run. Light measurement is very sensitive and 
can detect minute quantities of DNA (e.g., 10 2 -10 3 attomoles) (Example 7). 
60 Colorimetric substrates for the alkaline phosphatase system have also 

been tested. While the colorimetric substrates are useable in the present 
assay system, use of the light emission system is more sensitive. 



An alternative to the above biotin capture system is to use digoxigenin 
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in place of biotin to modify the oligonucleotide at a site protected by the 
DNA-binding protein at the assay site: biotin is then used to replace the 
digoxigenin moieties in the above described detection system. In this 
arrangement the anti-digoxigenin antibody is used to capture the oligonucleo- 
5 tide probe when it is free of bound protein. Streptavidin conjugated to 
alkaline phosphatase is then used to detect the presence of captured oligonu- 
cleotides . 

D. Alternative Methods for Detecting Molecules that Increase the 
10 Affinity of the DNA-Bindinq Protein for its Cognate Site . 

In addition to identifying molecules or compounds that cause a decreased 
affinity of the DNA-binding protein for the screening sequence, molecules may 
be identified that increase the affinity of the protein for its cognate binding 
site. In this case, leaving the capture system for unbound DNA in contact with 
15 the assay for increasing amounts of time allows the establishment of a fixed 
half-life for the DNA:protein complex (for example, using SEQ ID NO:601/UL9). 
In the presence of a stabilizing molecule, the half-life, as detected by the 
capture system time points, will be shortened. 

Using the capture system for DNA: protein complexes to detect molecules 
20 that increase the affinity of the DNA-binding protein for the screening se- 
quence requires that an excess of unlabeled oligonucleotide containing the UL9 
binding site {but not the test sequences) is added to the assay mixture. This 
is, in effect, an off-rate experiment. In this case, the control sample (no 
test molecules or mixtures added) will show a fixed off-rate. For example, 
25 samples would be taken at fixed intervals after the addition of the unlabeled 
competition DNA molecule, applied to nitrocellulose, and a decreasing amount of 
radiolabeled DNA: protein complex would be observed) . In the presence of a DNA- 
binding test molecule that enhanced the binding of UL9, the off-rate would be 
decreased (i.e., the amount of radiolabeled DNA:protein complexes observed 
w 30 would not decrease as rapidly at the fixed time points as in the control 
%ij sample) . 

hi 

M VI. Utility . 

^ A . The Usefulness of Sequence-Specific DNA-Bindinq Molecules . 

4* 35 The present invention defines a high through-put in vitro screening assay 

yl to test large libraries of biological or chemical mixtures for the presence of 

DNA-binding molecules having sequence binding preference. The assay is also 
" capable of determining the sequence-specificity and relative affinity of known 

H DNA-binding molecules or purified unknown DNA-binding molecules. Sequence- 

"]=i~4"0~ "specif ic - DNA-binding' molecules- are _ of -particular -interest- -for-several - reasons ,- 
y. which are listed here. These reasons, in part, outline the rationale for 

\ , determining the usefulness of DNA-binding molecules as therapeutic agents: 

First, for a given DNArprotein interaction, there are generally several 
Q thousands fewer target DNA-binding sequences per cell than protein molecules 

4 5 that bind to the DNA. Accordingly, even fairly toxic molecules, might be 
delivered in sufficiently low concentration to exert a biological effect by 
binding to the target DNA sequences. 

Second, DNA has a relatively more well-defined structure compared to RNA 
or protein. Since the general structure of DNA has less tertiary structural 
50 variation, identifying or designing specific binding molecules should be easier 
for DNA than for either RNA or protein. Double-stranded DNA is a repeating 
structure of deoxyribonucleotides that stack atop one another to form a linear 
helical structure. In this manner, DNA has a regularly repeating "lattice" 
structure that makes it particularly amenable to molecular modeling refinements 
55 and hence, drug design and development. 

Third, since many single genes (i.e., genes which have only 1 or 2 copies 
in the cell) are transcribed into more than one, potentially as many as 
thousands of RNA molecules, each of which may be translated into many proteins, 
targeting any DNA site, whether it is a regulatory sequence, non-coding se- 
60 quence or a coding sequence, may require a much lower drug dose than targeting 
RNAs or proteins. Proteins (e.g., enzymes, receptors, or structural proteins) 
are currently the targets of most therapeutic agents. More recently, RNA mole- 
cules have become the targets for antisense or ribozyme therapeutic molecules. 

Fourth, blocking the function of a RNA that encodes a protein or of the 
65 protein itself when that protein regulates several cellular genes may have 
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detrimental effects: particularly if some of the regulated genes are important 
for the survival of the cell. However, blocking a DNA-binding site that is 
specific to a single gene regulated by such a protein results in reduced 
toxicity. 

An example situation is HNF-1 binding to Hepatitis B virus (HBV) : HNF-1 
binds an HBV enhancer sequence and stimulates transcription of HBV genes 
(Chang, et al.). In a normal cell HNF-1 is a nuclear protein that appears to 
be important for the regulation of many genes, particularly liver-specific 
genes (Courtois, et al.) . If molecules were isolated that specifically bound 
to the DNA-binding domain of HNF-1, all of the genes regulated by HNF-1 would 
be down-regulated, including both viral and cellular genes. Such a drug could 
be lethal since many of the genes regulated by HNF-1 may be necessary for liver 
function. However, the assay of the present invention presents the ability to 
screen for a molecule that could distinguish the HNF-1 binding region of the 
Hepatitis B virus DNA from cellular HNF-1 sites by, for example, including 
divergent flanking sequences when screening for the molecule. Such a molecule 
would specifically block HBV expression without effecting cellular gene 
expression. 

B. General Applications of the Assay . 

General applications of the assay include but are not limited to: 
screening libraries of unknown chemicals, either biological or synthetic 
compounds, for sequence-specific DNA-binding molecules, determining the se- 
quence-specificity or preference and/or relative affinities of DNA-binding 
molecules, testing of modified derivatives of DNA-binding molecules for 
altered specificity or affinity, using the assay in secondary confirmatory or 
mechanistic experiments, using the data generated from the above applications 
to refine the predictive capabilities of molecular modeling systems, and using 
the refined molecular modeling systems to generate a new "alphabet" of DNA- 
binding subunits that can be polymerized to make novel heteropolymers designed 
die novo to bind specific DNA target sites. 

1 . Mass-Screening of Libraries for the Presence of Sequence- 
Specific DNA-Bindinq Molecules . 

Many organizations {e.g., the National Institutes of Health, 
pharmaceutical and chemical corporations) have large libraries of chemical or 
biological compounds from synthetic processes or fermentation broths or 
extracts that may contain as yet unidentified DNA-binding molecules. One 
utility of the assay is to apply the assay system to the mass-screening of 
these "libraries ' of - different' -brothsr -extracts,- or -mixtures to - detect. . the _ 
specific samples that contain the DNA-binding molecules. Once the specific 
mixtures that contain the DNA-binding molecules have been identified, the assay 
has a further usefulness in aiding in the purification of the DNA-binding mole- 
cule from the crude mixture. As purification schemes are applied to the 
mixture, the assay can be used to test the fractions for DNA-binding activity. 

The assay is amenable to high throughput (e.g., a 96-well plate format 
automated on robotics equipment such as a Beckman Biomek workstation [Beckman, 
Palo Alto, CA] with detection using semi-automated plate-reading densitometers, 
luminometers, or phosphoimagers) . 

The concentration of protein used in mass-screening is determined by the 
sensitivity desired. The screening of known compounds, as described in Section 
VI. B. 2, is typically performed in protein excess at a protein concentration 
high enough to produce 90-95% of the DNA bound in DNA: protein complex. The 
assay is very sensitive to discriminatory inhibition at this protein 
concentration. For some mass-screening, it may be desirable to operate the 
assay under higher protein concentration, thus decreasing the sensitivity of 
the assay so that only fairly high affinity molecules will be detected: for 
example, when screening fermentation broths with the intent of identifying high 
affinity binding molecules. The range of sensitivities in the assay will be 
determined by the absolute concentration of protein used. 

One utility of the method of the present invention, under conditions 
using a relatively insensitive system (high [P] : [D] ratio), is as a screening 
system for novel restriction enzymes. In this case, an ability to discriminate 
between slight differences in affinity to different sequences may not be 
necessary or desirable. Restriction enzymes have highly discriminatory 
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recognition properties — the affinity constant of a restriction endonuclease 
for its .specific recognition sequence versus non-specific sequences are orders 
of magnitude different from one another. The assay may be used to . screen 
bacterial extracts for the presence of novel restriction endonucleases. The 
5 256 test oligonucleotides described in Example 10, for example, may be used to 
screen for novel restriction endonucleases with 4 bp recognition sequences. 
The advantages of the system are that all possible 4 bp sequences are screened 
simultaneously, that is, it is not limited to self -complementary sequences. 
Further, any lack of specificity (such as, more than one binding site) is 
10 uncovered during the primary screening assay. 

2 . Directed Screening . 
The assay of the present invention is also useful for screening molecules 

that are currently described in the literature as DNA-binding molecules but 
15 with uncertain DNA-binding sequence specificity (i.e., having either no well- 
defined preference for binding "to specific DNA sequences or having certain 
higher affinity binding sites but without defining the relative preference for 
all possible DNA binding sequences) . The assay can be used to determine the 
specific binding sites for DNA-binding molecules, among all possible choices of 
20 sequence that bind with high, low, or moderate affinity to the DNA-binding 
molecule. Actinomycin D, Distamycin A, and Doxorubicin (Example 6) all provide 
examples of molecules with these modes of binding. Many anti-cancer drugs, 
such as Doxorubicin (see Example 6) , show binding preference for certain 
identified DNA sequences, although the absolute highest and lowest specificity 
25 sequences have yet to be determined, because, until the invention described 
herein, methods (Salas and Portugal; Cullinane and Phillips; Phillips; and 
Phillips, et al.) for detecting differential affinity DNA-binding sites for any 
drug were limited. Doxorubicin is one of the most widely used anti-cancer 
[fr drugs currently available. As shown in Example 6, Doxorubicin is known to bind 

30 some sequences preferentially. Another example of such sequence binding 
W preference is Daunorubicin (Chen, et al.) which differs slightly in structure 

jjj from Doxorubicin {Goodman, et aj.). Both Daunorubicin and Doxorubicin are 

l~l members of the anthracycline antibiotic family: antibiotics in this family, 

~Jt and their derivatives, are among the most important newer antitumor agents 

4* 35 (Goodman, et al.) . 

ff] The assay of the present invention allows the sequence preferences or 

„ specificities of DNA-binding molecules to be determined. The DNA-binding mole- 

7 S cules for which sequence preference or specificity can be determined may 

f~ include small molecules such as aminoacridines and polycyclic hydrocarbons, 

j*~ 4 0 planar dyes, various " DNA-binding antibiotics and anticancer drugs, as well as 
%il DNA-binding macromolecules, such as, peptides and polymers that bind to nucleic 

\, x acids (e.g./ DNA and the derivatized homologs of DNA that bind to the DNA 

helix) . ' 
LJ The molecules that can be tested in the assay for sequence 

M: 4 5 preference/specificity and relative affinity to different DNA sites include 
both major and minor groove binding molecules as well as intercalating and non- 
intercalating DNA binding molecules. 

3 . Molecules Derived from Known DNA-binding Molecules . 

50 The assay of the present invention facilitates the identification of 

different binding activities by molecules derived from known DNA-binding mole- 
cules. An example of this would be to identify and test derivatives of anti- 
cancer drugs that have DNA-binding activity and then test for anti-cancer 
activity through, for example, a battery of assays performed by the National 

55 Cancer Institute (Bethesda MD) . Further, the assay of the present invention 
can be used to test derivatives of known anti-cancer agents to examine the 
effect of the modifications on DNA-binding activity and specificity. In this 
manner, the assay may reveal activities of anti-cancer agents, and derivatives 
of these agents, that facilitate the design of DNA-binding molecules with 

60 therapeutic or diagnostic applications in different fields, such as antiviral 
or antimicrobial therapeutics. The binding-activity information for any DNA- 
binding molecule, obtained by application of the present assay, can lead to a 
better understanding of the mode of action of more effective therapeutics. 
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4 . Secondary Assays . 
As described above, the assay of the present invention is used (i) as a 
screening assay to detect novel DNA-binding molecules, or (ii) to determine the 
relative specificity and affinity of known molecules (or their derivatives) . 
The assay may also be used in confirmatory studies or studies to elucidate the 
binding characteristics of DNA-binding molecules. Using the assay as a tool 
for secondary studies can be of significant importance to the design of novel 
DNA-binding molecules with altered or enhanced binding specificities and 
affinities . 



a.) Confirmatory Studies . 
The assay of the present invention can be used in competition studies to 
confirm and refine the original direct binding data obtained from the assay. 

The primary screening assay does not provide for the direct determination 
15 of relative absolute affinities of test molecules for different test sequences. 
A competition method has been developed that aids in the interpretation and 
confirmation of the primary screening assay. The competition method also 
provides a means for determining the minimum difference in absolute affinities 
of any test sequences for a given test molecule. 
20 Sequences of interest are tested for their ability to compete with the 

test oligonucleotide for binding a test molecule of interest. In this method, 
DNA molecules that contain sequences that are high affinity binding sites, for 
the DNA-binding test molecule compete effectively with the test oligonucleotide 
for the binding of the test molecule. DNA molecules that contain sequences 
25 that are low affinity binding sites for the test molecules are ineffective 
competitors. In effect, the fold-difference in concentration required between 
?2 a high affinity competitor DNA and a low affinity competitor DNA, where the 

% competitor is required to compete with the test oligonucleotide for the binding 

'■^i of the DNA-binding test molecule, should be proportional to the difference in 

4l 30 affinity between the two competitor DNA molecules. 

Any test oligonucleotide may be used in the competition study. However, 
?~i in practice, since most secondary screening will be used to examine the 

" putative high affinity binding test sequences, the secondary competition assay 

^ is typically used to test a competitor oligonucleotide which is a putative high 

JZ 35 affinity test sequence. , 
~~ In the competition assay, the assay conditions are essentially the same 

y = as the conditions used in the primary screening assay. The assay components 

3 are mixed, with the exception of the DNA. The mixture includes protein, buffer 

U and the DNA-binding test molecule (control samples lack the test molecule) . A 

Vr-4-0- -test oligonucleotide is ■- labeled (for example, using a radioisotope,- -although 
any of the described capture/detection systems should be effective in the 
'? h competition study) . The DNA sample, including the radiolabeled test oligonu- 

! : cleotide and unlabelled competitor DNA is added to the assay mixture. 

Typically, the competitor DNA of interest is added to different reactions over 
4 5 a range of competitor concentrations. Two controls are commonly run: (i) no 
DNA binding test molecule added; and (ii) test DNA but no competitor DNA added. 

The reactions are incubated for the desired time and the DNA: protein 
complexes separated from free DNA (i.e., DNA not associated with protein) by 
passing the mixture through nitrocellulose. Other capture systems, such as the 
50 biotin/streptavidin system discussed in Section V, are also effective. The 
amount of radiolabeled test oligonucleotide bound by protein (i.e., bound to 
the filter) is indicative of the effect of the competitor. 

One example of a competition assay is as follows. A test oligonucleotide 
containing the test sequence TTAC ranks as a high affinity binding site for a 
55 test molecule. The TTAC test oligonucleotide is radiolabeled and mixed with 
non-radiolabeled competitor DNAs that contain, for example, a putative high 
affinity binding site (the same site, TTAC, is one example) or a putative low 
affinity binding site (e.g., CCCC) . In the absence of any competing nonlabeled 
DNA or DNA-binding test molecule, the amount of radiolabeled DNA: protein 
60 complex observed (called r%) is arbitrarily established as 100%. The 
concentration of the protein used in this experiment is high enough to bind 
most of the radiolabeled test oligonucleotide in the absence of test molecules 
or competing DNA molecules (this is essentially the same concentration as used 
in the primary screening assay) . . 
65 The test molecule is added to the reaction at a concentration sufficient 



to markedly reduce r%, the amount of observed DMA: protein complex. The greater 
the reduction in signal, the more easily competition is observed. The amount 
of competitor DNA needed to observe competition is proportional to the amount 
of DNA-binding test molecule used; therefore, the amount of test molecule used 
5 should be sufficient to reduce r% to between approximately 10% to 70%. The 
effect of an effective competitor, such as TTAC, is to cause r% to rise towards 
100%. 

The competition for test molecule binding is between the non-labeled 
competitor DNA and the radiolabeled test oligonucleotide. As the competitor 

10 DNA concentration increases, the test molectule binds to the competitor DNA and 
is effectively removed from solution. Accordingly, the test molecule is no 
longer able to block the binding of the protein to the radiolabeled oligonucle- 
otide. A less effective competitor, typically a competitor DNA with low 
affinity for the test molecule, will compete less effectively for the DNA- 

15 binding test molecule, even at substantially higher concentrations than the 
high affinity competitor. A completely ineffective competitor, i.e., one that 
did not bind the test molecule, would not cause the r% value to change, even at 
high concentrations of the competitor DNA. 

When a competitor DNA has some affinity for the test molecule, 

20 competition (r% rising towards 100%) would be observed at some competitor DNA 
concentration. The difference in concentration between two competing DNA se- 
quences to achieve an equivalent r% (e.g., 90%) should reflect the relative 
difference in absolute affinity between the two competitor DNA molecules. For 
example, if 5 MM TTAC is required to achieve a change in r% from 50% to 90% in 

25 the presence of a test molecule and 200 |iM CCCC is required to achieve the same 
change in r%, then the fold difference in affinity between TTAC and CCCC for 
*t the test molecule is 200/5 = 40-fold. 

y=? In th« context of screening distamycin with all possible 256 bp test se- 

yQ quences (Example 10), the confirmatory assay can be used (i) to confirm the 

30 rankings observed in the assay, (ii) to refine the rankings among the 5-10 
highest ranked binders (which show no statistical difference in rank with data 
^ from 4 experiments), and (iii) to resolve perceived discrepancies in the assay 

Lij data. All of these goals may be accomplished using a competition experiment 

jf which determines the relative ability of test sequences to compete for the 

JT 35 binding of distamycin. 

y- The perceived discrepancy in the distamycin experiment is as follows: 

» test oligonucleotides scored poorly in the assay which were complementary to 

Uk most of the top-ranking test sequence oligonucleotides (Examples 10 and 11) . 

. This result was unexpected since it is unlikely that the affinity of distamycin 
V~ 40 for binding a test site depends on the orientation of the screening site to the 
h= test site. More likely, the assay detects the binding of distamycin when the 

jjj molecule is bound to the test oligonucleotide in one orientation, but fails to 

'ft detect the binding of distamycin when the test sequence is in the other 

orientation. A competition study will resolve this question, since the binding 
H 4 5 of distamycin to a competitor sequence will be orientation-independent; the 
competition does not depend on the mechanism of the assay. 

For the competition experiment, the assay may be performed under any 
conditions suitable for the detection of drug binding. When these conditions 
are established, different competitor DNAs are added to the assay system to 
50 determine their relative ability to compete for drug binding with the 
radiolabeled test oligonucleotide in the assay system. 

The competitor DNAs may be any sequence of interest. Several classes of 
DNA may be tested as competitor molecules including, but not limited to, the 
following: genomic DNAs, synthetic DNAs (e.g., poly(dA), poly(dl-dC), and 
55 other DNA polymers), test oligonucleotides of varying sequences, or any mole- 
cule of interest that is thought to compete for distamycin binding. 

When using the competition assay to verify the results of a 256 oligonu- 
cleotide panel screen (like Example 10), the following criteria are useful for 
selecting the competitor test oligonucleotides: 
60 (i) sequences that rank high in the assay but which do not have relative 

binding affinities with differences that are statistically significant from 
each' other, in order to determine their relative affinity with greater 
precision; 

(ii) sequences that are purported by other techniques (e.g., footprinting 
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or transcriptional block analysis) to be high affinity binding sites, in order 
to compare the results of those techniques with the screening assay results; 

(iii) sequences that are complementary to test sequences that rank high 
in the assay, in order to determine whether these test sequences are false 

5 negatives; and . 

(iv) sequences of any rank in the assay, in order to confirm the assay 

results. . 

Several methods may be used to perform the competition study as long as 
the relative affinities of the competing DNA molecules are detectable. One 
10 such method is described in Example 14. In this example, the concentration of 
the assay components (drug, protein, and DNA) is held constant relative to 
those used in the original screening assay, but the molar ratio of the test 
oligonucleotide to the competitor oligonucleotides is varied. 

Another method for performing a competition assay is to hold the 
15 concentrations of protein, drug and initial amount of test oligonucleotide 
constant, then add a variable concentration of competitor DNA. In this design, 
the protein and drug concentration must be sufficiently high to allow the 
addition of further competitor DNA without i) decreasing the amount of 
DNA: protein complex in the absence of drug to a level that is unsuitable for 
20 detection of DNA: protein complex, and ii) increasing the amount of DNA: protein 
complex in the presence of drug to a level that is unsuitable for the detection 
of drug binding. The window between detectable DNA: protein complex and 
detectable effect of the drug must be wide enough to determine differences 
among competitor DNAs . . 
25 in any competition method, it is important that the relative 

concentrations of the competing DNA molecules are accurately determined. One 
method for accomplishing accurate determination of the relative concentrations 
J1 of the DNA molecules is to tracer-label competitor molecules to a low specific 

activity with a common radiolabeled primer (Example 14) . In this manner, the 
^ 30 competitor molecules have the same specific activity, but are not sufficiently 
&l radioactive (200-fold less than the test oligonucleotide) to contribute to the 

jjj overall radioactivity in the assay. 

^ b . ) Secondary Studies to Elucidate Binding Characteristics . 

4l 35 The studies outlined in Section VLB. 4. a describe methods of determining 

■ni some of the binding processes of distamycin A. The assay of the present 

Z invention may also be used to explore mechanistic questions about distamycin 

7. binding. . , . , 

For example, several of the complements of the putative high affinity 
-ks -4-0- -binding -sites- for-di stamycin -have— low-scores -in - the-assay .- -As-described- above,-. 

this may imply directionality in binding. The results may also imply that the 
!\ test sites are not equal with respect to the effect exerted on UL9-COOH 

^ binding. Oligonucleotides can be designed to test the hypothesis of 

C3 directionality. . 

LjL 45 The basic test oligonucleotide has the structure presented in Figure 2 / A 

(SEQ ID NO: 621). In one scenario, the score in the binding assay is high, 
i.e., the greatest effect of distamycin, when the test sequences is XYZZ 
(Figure 27A, with the base X complementary to the base Y and the base Q 
complementary to the base Z) , and the complement (Figure 27B; SEQ ID NO:622) 
50 scores low. These results imply that the test sites are not equivalent with 
respect to their effect on UL9, otherwise the right side would have the effect 
in one oligonucleotide and the left site would have the effect in the other. 
These results further suggest that the effect of distamycin is directional. 
The only assumption is that distamycin should bind with the same affinity to 
55 the XYZZ/QQXY sequence (Figures 27A and 27B) regardless of its position or 
orientation in the oligonucleotide. Since the scores are derived at 
equilibrium, this is likely to be the case. 

To test the hypothesis that one site is effective in the assay, oligonu- 
cleotides may be designed that have the UL9 site inverted with respect to the 
60 test sites (Figures 27C and 27D; SEQ ID NO: 623 and SEQ ID NO: 624, 
respectively) . If only one site is active with respect to UL9 and if the 
Figure 27A oligo was most effective in binding distamycin, then the oligo C 
should be less active in the assay then oligo D; in other words, flipping the 
UL9 site will result in QQXY ranking high, XYZZ ranking low. 
65 Finally, to determine the "direction" of distamycin binding, mix test se- 



quences and invert the binding site as shown in the four oligonucleotides 
presented in Figures 27E, 27F, 27G and 27H. Alternatively, one test site or 
the other could be deleted from the test oligonucleotide. 

This type of analysis provides an example of the usefulness in the assay 
5 in determining binding properties of DNA-binding drugs. 

c . ) Restriction Endonucleases as Indicator Proteins in the 
Assay . Other DNA:protein interactions that are useful as screening sequences 
and their cognate binding proteins (indicator proteins) are restriction 
10 enzymes. Such secondary screening assays are performed using the same criteria 
to establish conditions for the primary screening assay {described in Example 
4) . The assay conditions can be varied to accommodate different DNA: protein 
interactions/ as long as the assay system follows the functional criteria 
discussed above (Section I}. 
15 One limitation of using restriction endonucleases in the method of the 

present invention is that the assay buffer should not contain divalent cations. 
In the absence of divalent cations, the enzymes will bind the appropriate 
recognition sequence, but not cleave the DNA. In the presence of divalent 
cations, the test oligonucleotide can be cleaved at or near the protein binding 
20 site. 

By using different indicator proteins, a different recognition sequence 
can be used to flank the test site. This variation allows the resolution of 
questions regarding the potential binding of a test molecule to a site internal 
to any single screening sequence. For example, the assay system is used where 
25 the UL9 protein and its recognition sequence are used as the indicator 
protein .-screening sequence interaction. In this system, if the highest 
W affinity binding site for a test molecule is TTAC, then several test sequences 

may be predicted to rank high in the assay system: several of these test se- 
quences are presented in Figure 31. In Figure 31, the test site is shown in 
30 bold, the potential binding site for the test molecule is shown underlined. 
^ One test oligonucleotide on which the DNA-binding test molecule would be 

hi predicted to have a high level of effect is the oligonucleotide containing the 

fjj test site, TTAC (Figure 31) . However, since the UL9 recognition sequence 

^~ contains the sequence TT, flanking the test site, several other test oligonu- 

=s a = 35 cleotides might also be expected to have high activity in the assay (see Figure 

m 3D. 

By using a different DNA: protein interaction as the indicator system in a 
j\ secondary screening assay, the "false positives" shown for TACN and ACNN (shown 

?*_ in Figure 31) .can be identified. The recognition sequence for the protein in a 

H 4 0 secondary screening" assay" simply needs- to- have- a -different - screening- sequence. 
y& in the region flanking the test site than the UL9 screening sequence. 

j lS Restriction endonucleases provide an entire class of different 

Zi DNA:protein interactions with a wide array of available sequences that can be 

Q . used in this manner. For example, Smal recognizes the sequence 5' -CCCGGG-3' . 
Uk 4 5 Using the Smal: DNA interaction and the same test sequences presented in Figure 
31, the resulting test oligonucleotides would have the test sequences presented 
in Figure 32. As can be seen from a comparison of Figures 31 and 32, changing 
the screening sequence from the UL9-binding sequence to the Sma I-binding se- 
quence eliminates the potential test molecule binding sites internal to the 
50 screening sequence (e.g., compare TACN and ACNN in the figures). 

The use of different DNA-binding proteins as indicator proteins in the 
assay is also applicable to the PCR-based test oligonucleotide selection 
technology (Section III) . 

55 5. Generation of Binding Data and Refinement of Molecular 

Modeling Systems . 

The assay of the present invention generates data which can be 
applied to the refinement of molecular modeling systems that address DNA 
structural analysis: the data is also useful in the design and/or refinement 
60 of DNA-binding drugs. Traditionally, mass screening has been the only 
reasonable method for discovering new drugs. Modern rational drug design seeks 
to minimize laboratory screening. However, ab initio rational drug design is 
difficult at this time given (i) insufficiencies in the underlying theories 
used for de novo design, and (ii) the computational intensity which accompanies 



such design approaches. 

The ab initio approach requires calculations from first principles by 
quantum mechanics: such an approach is expensive and time-consuming. The 
introduction of data concerning the relative binding affinities of one or more 
DNA-binding molecules to all 256 four base pair DNA sequences allows the 
development, via molecular modeling, of ad hoc protocols for DNA structural 
analysis and subsequent DNA-binding drug design. The accumulation of data for 
the DNA sequences to which small molecules bind is likely to result in more 
accurate, less expensive molecular modeling programs for the analysis of DNA. 

The screening capacity of the assay of the present invention is much 
greater than screening a single DNA sequence with an individual cognate DNA- 
binding protein. Direct competition assays involving individual 
receptor : ligand complexes (e.g., a specific DNA:protein complex) are most 
commonly used for mass screening efforts. Each such assay requires the 
identification, isolation, purification, and production of the assay 
components. In particular, a suitable DNA:protein interactions must be 
identified for each selected screening sequence. Using the assay of the 
present invention, libraries of synthetic chemicals or biological molecules can 
be screened to detect molecules that have preferential binding to virtually any 
specified DNA sequence — all using a single assay system. When employing the 
assay of the present invention, secondary screens involving the specific 
DNA: protein interaction may not be necessary, since inhibitory molecules 
detected in the assay may be tested directly- on a biological system: for 
example, the ability to disrupt viral replication in a tissue culture or animal 
model . 

6. The Design of New DNA-Bindinq Heteropolymers Comprised of 
Subunits Directed to Different DNA Sequences . 

The assay of the present invention will facilitate the predictive 
abilities of molecular modeling systems in two ways. First, ad hoc methods of 
structural prediction will be improved. Second, by employing pattern matching 
schemes, the comparison of sequences having similar or different affinities for 
a given set of DNA-binding molecules should empirically reveal sets of se- 
quences that have similar structures (see Section VI . D, Using a Test Matrix). 
Molecular modeling programs are "trained" using the information concerning DNA- 
binding molecules and their preferred binding sequences. With this information 
coupled to the predicative power of molecular modeling programs, the design of 
DNA-binding molecules (subunits) that could be covalently linked becomes 
feasible. 

These molecular subunits would be directed at defined sections of DNA. 
For example, a subunit would be designed for each possible DNA unit. For 
example, if single bases were the binding target of the subunits, then four 
subunits would be required, one to correspond to each base pair. • These 
subunits could then be linked together to form a DNA-binding polymer, where the 
DNA binding preference of the polymer corresponds to the sequence binding 
preferences of the subunits in the particular order in which the subunits are 
assembled. 

Another example of such a polymer is using subunits whose binding was 
directed at two base sections of DNA. In this case, 4 2 = 16 subunits would be 
used, each subunit having a binding affinity for a specific two base pair se- 
quence (e.g., AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) . 

If the polymers were to be comprised of subunits targeted to 3 base pair 
sections of DNA, then 4 3 = 64 subunits would be prepared. The design of such 
molecular subunits is dependent upon the establishment of a refined database 
using empirical data derived by the method of the present invention, as 
described in Section VI. B. 
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C. Sequences Targeted by the Assay . 

The DNA: protein assay of the present invention has been designed to 
screen for compounds that bind a full range of DNA sequences that vary in 
length as well as complexity. Sequence-specific DNA-binding molecules 
5 discovered by the assay have potential usefulness as either molecular reagents, 
therapeutics, or therapeutic precursors. Sequence-specific DNA-binding mole- 
' cules are potentially powerful therapeutics for essentially any disease or 
condition that in some way involves DNA. Examples of test sequences for the 
assay include: a) binding sequences of factors involved in the maintenance or 
10 propagation of infectious agents, especially viruses, bacteria, yeast and other 
fungi, b) sequences causing the inappropriate expression of certain cellular 
genes, and c) sequences involved in the replication of rapidly growing cells. 
Furthermore, gene expression or replication need not necessarily be disrupted 
by blocking the binding of specific proteins. Specific sequences within 
15 protein-coding regions of genes (e.g., oncogenes) are equally valid test se- 
quences since the binding of small molecules to these sequences is likely to 
perturb the transcription and/or replication of the region. Finally, any mole- 
cules that bind DNA with some sequence specificity, that is, not just to one 
particular test sequence, may be still be useful as anti-cancer agents. 
20 Several small molecules with some sequence preference are already in use as 
anticancer therapeutics. Molecules identified by the present assay may be 
particularly valuable as lead compounds for the development of congeners having 
either different specificity or different affinity. 

One advantage of the present invention is that the assay is capable of 
25 screening for binding activity directed against any DNA sequence. Such se- 
quences can be medically significant target sequences scrambled or randomly 
rl generated DNA sequences, or well-defined, ordered sets of DNA sequences. Other 

1* sets could be used for screening for molecules demonstrating . sequence 

.i preferential binding (like Doxorubicin) to determine the sequences with highest 

C= 30 binding affinity and/or to determine the relative affinities between a large 
yl number of different sequences. There is usefulness in taking either approach 

for detecting and/or designing new therapeutic agents. Section VI. C. 3, 
"Theoretical Considerations for Choosing Target Sequences", outlines the 
yj theoretical considerations for choosing DNA target sites in a biological 

4~ 35 system. 

m 

l. Medically Significant Target Sequences . 
H Few effective viral therapeutics are currently available; yet 

U several potential target sequences for antiviral DNA-binding drugs have been 

Uk 40 well-characterized. Furthermore, with the accumulation of sequence data on all 
=\ biological systems, including viral genomes, cellular genomes, pathogen genomes 

!* & (bacteria, fungi, eukaryotic parasites, etc.), the number of target sites for 

.UJ DNA-binding drugs will increase greatly in the future. 

f% There are numerous methods for identifying medically significant target 

T1 4 5 sequences for DNA-binding drugs, including, but not limited to, the following. 
p * First, medically significant target sequences' are found in pathogens of the 

biological kingdoms, for example in genetic sequences that are key to 
biochemical pathways or physiological processes. Second, a target is 
identified, such as (i) a pathogen involved in an infectious disease, or (ii) a 
50 biochemical pathway or physiological process of a noninfectious disease, 
genetic condition, or other biological process. Then specific genes important 
for the survival of the pathogen or modulation of the endogenous pathway 
involved in the target system are identified. Third, specific target sequences 
are identified that affect the expression or activity of a DNA molecule, such 
55 as genes or sites involved in replication. 

There are numerous pathogens that are potential targets for DNA-binding 
drugs designed using the methods described in this application. Table I lists 
a number of potential target pathogens. 

60 
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Table I : Pathogens 

VIRUSES 

Retroviruses 
Human 

HIV I ,11 

HTLV I, II 

Animal 

SIV 

STLV I 
FELV 
FIV 
BLV 

BIV (Bovine immunodeficiency virus) 
Lentiviruses 

Avian reticuloendotheliosis virus 



Animal - continued 
SIV 
STLV I 
FELV 
FIV 
BLV 

BIV (Bovine immunodeficiency virus) 
Lentiviruses 

Avian reticuloendotheliosis virus 
Avian sarcoma and leukosis viruses 
Caprine arthritis-encephalitis 
Equine infectious anemia virus 
Maedi/visna of sheep 
MMTV (mouse mammary tumor virus) 
Progressive pneumonia virus of sheep 
Herpes viridae 
Human 

EBV 
CMV 

HSV I, II 

VZV 

HH6 

- - — Cercopthecine Herpes-Virus -(B Virus) - 

Old world monkeys with infection into humans. 
Animal 

Bovine Mammillitis virus 

Equine Herpes virus 

Equine coital exanthema virus 

Equine rhinopneumonitis virus 

Infectious bovine rhinotracheitis virus 

Marek's disease virus of fowl 

Turkey herpesvirus 

Hepadna viruses 

Human 

HBV/HDV 

Animal 

Duck Hepatitis 
Woodchucks 

Squirrels 

Poxviridae 
Human 

Orf virus 
Cow Pox 
Variola virus 
Vaccinia 
Small Pox 
Pseudocowpox 
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Poxviridae - continued 


1 


Animal 




Bovine papular stomatitis virus 




Cowpox virus 




Ectromelia virus (mouse pox) 




Fibroma viruses of rabbits/squirrels 




Fowlpox 




Lumpy skin disease of cattle virus 




Myxoma 




Pseudocowpox virus 




Sheep pox virus 




Swine pox 




Papovaviridae 




Human 




BK virus 




SV-40 




JC virus 




Human Papillomaviruses 1-58 (see list Fields) 




Animal 




Lymphotropic papovavirus (LPV) Monkey 




Bovine papillomavirus 




Shope papillomavirus 




Adenoviridae 




Human 




Adenoviruses 1-4 




Animal 




Canine adenoviruses 2 




Parvoviridae 




Human 




AAV (Adeno Associated Virus) 




B19 (human) 




Animal 




FPV (Feline parvovirus) 




PPV (Porcine parvovirus) 




ADV (Aleutian disease, mink) 




Bovine Parvovirus 




Canine Parvovirus 




F.eline panlejikopenia virus 




Minute virus of mice 




Mink enteritis virus 
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Streptococcus 




pneumonia 




bovis 




Group A Streptococci 




Agents responsible for: 




Streptococcal pharyngitis 




Cervical adenitis 




Otitis media 




Mastoiditis 




Peritonsillar abscesses 




Meningitis 




Peritonitis 




Pneumonia 




Acute glomerulonephritis 




Rheumatic fever 




Erythema nodosum 




Staphylococcus 




aureus 




epidermidis 




saprophyticus 




cohnii 
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haemolytilcus 

xylosus 

warneri 



capitis 

horainis 

silmulans 

saccharolyticus 

auricularis 

Agents responsible for: 

Furunckles 

Carbuncles 

Osteomyelitis 

Deep tissue abscesses 

Wound infections 

Pneumonia 

Empyema 

Pericarditis 

Endocarditis 

Meningitis 

Purulent arthritis 

Enterotoxin in food poisoning 

Branhamel la catarrhalis 

Neisseria 

gonorrhoea 
lactamica 
sicca 
sub f lava 

mucosa 

Neisseria ~ continued 
f lavescens 
cinerea 
elongata 
canis 

meningi tides 

nteric Bacilli and Similar Gram-Negative Bacteria 

Escherichia 

Proteus 

• _ _ __ Klebsiella 

Pseudomonas aeruginosa 

Enterobacter 

Citrobacter 

Proteus 



Providencia 
Bacteroides 
Serratia 

Pseudomonas {not aeruginosa) 

Acinetobacter 

Salmonella 

Shigella 

Aeromonas 

Moraxella 

Edwardsiella 

Ewingella 

Hafnia 

Kl uyvera 

Morganella 

Plesiomonas 

Pseudomonas 

aeruginosa 

putida 

pseudomallei 

mallei 



Haemophilus 

ducreyi 
influenzae 

parainfluenzas 

Bordetella pertussis 

Yersinia 

pestis (plague) 

pseudotuberculosis 

en terocol it ica 

Francisella tularensis 

Pasteurella multocida 

Vibrio 

cholerae 

parhaemolyticus 

fluvialis 
furnissii 

mimicus 

Brucella 

melitensis 

abortus 

suis 

canis 

Bartonella bacilliformis 

Gardnerella vaginalis 

Borrelia 

recurrentis 
. hermsii 

duttoni 

crocidurae 

burgdorferi (Lyme disease) 

Bacillus 

anthracis 

cereus 

megaterium 

subtilis 

sphaericus 

circulans 

brevis 

lentiformis 

macerans 

pumilus 

thuringiensis 

larvae 

lentimorbus 

popilliae 

Streptobacillus moniliformis (rat bite fever) 

Spirillum minus (rat bite fever) 

Rothia dentocariosa 

Kurthia 

Clostridium 

botulinum 

nouyi 

bifermentans 

Clostridium - continued 

histolyticum 

ramosum 

tetani 

perf ringens 

novyi 

septicum 

Campylobacter 

jej uni 

fetus 



hyintestinalis 
fennelliae 

cinaedi 

Co ryn eba cterium 
ulcerans 

pseudotuberculosis 
JK 

diphtheriae 

Legionella 

pneumophila 

bosemanii 

micdadie 

bosenamii 

feleii 

many others , , 

Mycobacterium 

tuberculosis 

afri canum 

bovis 

leprae 

avium complex 
kansasii 

fortuitum complex 

scrofulaceum 

marinum 

ulcerans 

Actinomyces 

Bacteroides 

fraqiligis 

Fusobacterium 

necrophorum 

nvclea turn . 

Peptostreptococcus 

Arachnia , 

Bifidobacterium 

Propionibacteri urn 

Nocardia _ 

Treponema pallidum (syphilis) 

Rickettsiae - 
Typhus 

R. prowazeki (epidemic) 

R. prowazeki (Brill's disease) 

R. typhi (endemic) 
Spotted fever 

R. rickettsi 

R. sibiricus 

R. conor ii 

R. australis 

R. akari 
Scrub typhus 

R. tsutsugamushi 
Q fever 

Coxiella burnetii 
Trench fever 

Rochalimaea quintana 

Chlamydiae 

C. trachomatis 
(blindness, pelvic inflammatory disease, LGV) 

Mycoplasma 

pneumoniae 

Ureaplasma urealyticum 

Cardiobacterium hominis 

Actinobacillus actinomycetemcomitans 



Kingella *_ 

Capnocytophaga 

Pasteurella multocida 

Leptospira interrogans 

Listeria monocytogenes 

Erysipelothrix. rhusiopthiae 

Streptobacillus moniliformis 

Calymmatobacterium granulomatis 

Bartonella bacilliformis 

Francisella tularensis 

Salmonella typhi 

, . . FUNGAL 

Actinomyces 

israelii 

naeslundii 

viscosus 

odontolyticus 

meyeri 

pyogenes 

Cryptococcus neoformans 

Blastomyces dermatitidis 

Histoplasma capsulatum 

Coccidioides immitis 

Paracoccidioides brasiliensis 

Candida 

albicans 

tropicalis 

(Torulopsis) glabrata 

parapsilosis 

Aspergillus 

fumigatus 

flavus 

niger 

terrevs 

Rhinosporidiosis seeberi 

Phycomycetes 

JSporothriK schenickii 

Mucorales 

En tomophtho rales 

Agents of Chromoblastomycosis 

Mi crosporum 

M. audouilni (ring worm) 

M. can is 

H. gypseum 

Trichophyton 

T. schoenleinii {favus- ringworm) 

T. violacevm (hair) 

T. tonsurans (hair) 

T. mentagrophytes (athlete's foot) 

T. rubrum (athlete* s foot) 

Malassezia furfur 

Cladosporium 

werneckii 

carrioni 

Fonsecaea 

pedrosoi 

compacta 

Phialophora verrucosa 

Rhinocladiella aquaspersa 

Trichosporon cutaneum 

Piedraia hortai 



Ascomycota 


Basidiomycota 


Deuteromycota 


Norcarciia 




brasiliensis 




caviae 




asteroides 




PARASITIC PATHOGENS 


PI cm or/ inm ( mA 1 aria) 




falcilparum 








ovaie 








Schistosoma 




japonmicum 




mansoni 




haematobium 




intercalation 




mekongi 




Trypanosoma 




brucei gambiense 




brucei rhodesiense 




evansi 




cruzi 




equiperdum 




conqolense 




Entamoeba histolytica 


Naealeria fowleri 


Acan thoamoeba 




astronyxis 




castellanii 




culbertsoni 




hatchetti 




palestinensis 




polyphaga 




rhyusodes 




Leishmania 




dovonani 




infantum 




chagasi 




topica 




major 




aethiopica 




mexicana 




braziliensis 




peruviana 




Pneumocystis carinii (interstitial pneumonia) 


Babesia (tick born hemoprotozoan) 




microti 




di vergens 




Giardia Iambi ia 


Trichomonas (venereal disease) 




vaginalis 




hominis 




ten ax 




Cryptosporidium parvum (intestinal protozoan) 


Isopora belli (dysentery) 


Balantidium coli (protozoon induced dysentery) 


Dientamoeba fragilis 


Blastocystis hominis 


Trichinella spiralis (parasitic nematode) 


Wuchereria ban crofti (lymphatic filariasis) 



Brugia (lymphatic filariasis) 
malayi 

timori 

Loa loa (eye worm) 

Onchocerca volvulus 

Mansonella 

per s tans 

ozzardi 

streptocerca 

Dirofilaria immltis 

Angiostrongylus cantonensis 

costaricensis 

malayensis 

mackerrasae 

Anisakls (nematode) 
simplex 

typica 

Pseudoterranova d&cipiens 

Gnathostoma spinigerum 

racunculus medinensis (filarial parasite, guinea worm) 

Trichuris trichiura (whip worm) 

Ascaris lumbricoides (nematode) 

Toxocara canis (nematode round worms) 

ATecator americanus (heart worm) 

Ancylostoma {hook worm) 
duodenale 
ceylanicum 

americanus 

members of the species Trichostrongylus 

Strongyloides (intestinal nematode) 
stercoralis 

fuelleborni 

Capillaria philippinensis (intestinal nematode) 

arious species of Paragonimus (lung fluke disease) 

Various species of Micorsporida 

Clonorchis sinensis (liver fluke) 

Fasciola { trematode, intestinal worm) 

hepa "t ica — 

gigan tica 

Fasciolopsis buski 

Heterophyes heterophyes 

Metagonimus yakaqawa 

Taenia 

saginata [beef tapeworm) 

solium (pork tapeworm) 

Hymenolepis (dwarf tapeworm) 
nana 

nana fraterna 

dimlnuta 

Dipylidium caninum (tapeworm of dogs and cats) 

Diphyllobothrium (fish tapeworms) 

lantum 

dalliae 

nihonkaiense 

pacificum 

Echinococcus (tape worm with cysts) 
granulosus 
multilocularis 

vogel i 

Enterobius vermicularis (Pin worm) 
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In addition to pathogens, many non-infectious diseases may be controlled 
at the level of DNA. These diseases are therefore potential candidates for 
treatment with DNA-binding therapeutics that are discovered or designed using 
the methods described in this application. Table II lists a number of poten- 
tial non-infectious diseases that may be targeted for treatment using DNA- 
binding molecules. 
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Table II: Noninfectious Diseases 



CANCER 



Lung 



Adenocarcinoma 
Squamous cell 
Small cell 



Breast carcinoma 



Ovarian 

Serous tumors 
Mucinous tumors 
Endometrioid carcinoma 



Endometrial carcinoma 



Colon carcinoma 



Malignant Melanoma 



Prostate carcinoma 



Lymphoma 

Hodgkins 
Non-Hodgkin 's 



Leukemia 

Chronic Myelogenous 
Acute Myelogenous 
Chronic Lymphocytic 
Acute Lymphocytic 



Cervical carcinoma 



Seminoma 



Multiple Myeloma 



Bladder carcinoma 



Pancreatic carcinoma 



Stomach carcinoma 



Thyroid 

Paprl-lary -adenocarcinoma — — 

Follicular carcinoma 
Medullary carcinoma 



Oral & Pharyngeal carcinomas 



Laryngeal carcinoma 



Bladder carcinoma 



Renal cell carcinoma 



Hepatocellular carcinoma 



Glioblastoma 



Astrocytoma 



Meningioma 



Osteosarcoma 



Pheochromocytoma 



CARDIOVASCULAR DISEASES 



Hypertension 
Essential 
Malignant 



Acute Myocardial Infarction 



Stroke 

Ischemic 
Hemorrhagic 



Angina Pectoris 



Unstable angina 



Congestive Heart Failure 
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Supraventricular arrhythmias 



Ventricular arrhythmias 



Deep Venous Thrombosis 



Pulmonary Embolism 



Hypercholesterolemia 



Cardiomyopathy 



Hypertriglyceridemia 



RESPIRATORY DISORDERS 



Allergic rhinitis 



Asthma 



Emphysema 



Chronic bronchitis 



Cystic Fibrosis 



Pneumoconiosis 



Respiratory distress syndrome 



Idiopathic pulmonary fibrosis 



Primary pulmonary hypertension 



GASTROINTESTINAL DISORDERS 



Peptic ulcers 



Cholelithiasis 



Ulcerative colitis 



Crohn's disease 



Irritable Bowel Syndrome 



Gastritis 



Gilbert's syndrome 



Nausea 



ENDOCRINE /METABOLIC DISORDERS 



Diabetes mellitus type I 



Diabetes mellitus type II 



Diabetes insipidus 



Hypothyroidism 



Hyperthyroidism 



Gout 



Wilson's disease 



Addison's disease 



Cushing's syndrome 



Acromegaly 



Dwarfism 



Prolactinemia 



Morbid obesity 



Hyperparathyroidism 



Hypoparathyroidism 



Osteomalacia 



v y - . RHEUMATOLOGY/IMMONOLOGY DISORDERS 

Transplant rejection 

Systemic lupus erythematosus 

Rheumatoid arthritis 

Temporal Arteritis 

Amyloidosis 

Sarcoidosis 

Sjogren's Syndrome 

Scleroderma 

Ankylosing spondylitis 

Polymyositis 

Reiter's Syndrome 

Polyarteritis nodosa 

Kawasaki's disease 
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HEMATOLOGIC DISORDERS 

Anemia 

Sickle cell 
Sideroblastic 
Hereditary spherocytosis 
Aplastic 

Autoimmune hemolytic anemia 

Thalassemia _ 

Disseminated intravascular coagulation 

Polycythemia vera 

Thrombocytopenia 

Thrombotic thrombocytopenic purpura 

Idiopathic thrombocytopenic purpura 

Hemophilia ; 

von Willebrand's disease 

Neutropenia 

Post -chemotherapy 

Post-radiation 



| • ■ . -, . NEUROLOGIC DISORDERS 

Alzheimer's disease 

Parkinson's disease 

Myasthenia gravis 

Multiple sclerosis 

Amyotrophic lateral sclerosis 

Epilepsy . , 

Headaches 

Migraine 

Cluster 

Tension 

Guillain-Barre syndrome 

Pain (post-op, trauma) . 

Vertigo 

, ^ > ; V" — PSYCHIATRIC DISORDERS^^^ / ' 
Anxiety . 

Schi zophrenia 

Substance abuse _ 

Manic- Depress ion 

Anorexia 

. . ^;^:.;v?V.-;'; r t, s DEniRTCWIC' DlSOWg-tS-ir^r;';:'^.^ ■ ■ ■ 

Acne 

Psoriasis _ . . 

Eczema — 

Contact dermatitis . 

Pruritis 

OPHTHALMIC DISORDERS 7 ~ 

Glaucoma 

Allergic conjunctivitis 

Macular degeneration 

MUSCULOSKELETAL DISORDERS ~ 

Osteoporosis , 

Muscular dystrophy 

Osteoarthritis 

GENETIC DISORDERS .. - • ~ 

I Down's syndrome 

| Marfan' s syndrome 



Neurofibromatosis 

Tay-Sachs disease 

Gaucher' s disease 

Niemann-Pick disease 

GENITAL-URINARY DISORDERS 
Benign prostatic hypertrophy 

Polycystic kidney disease 

Non- infectious glomerulonephritis 

Goodpasture's syndrome 

Urolithiasis 

Endometriosis 

Impotence 

Infertility 

Fertility control 

Menopause 



Once a disease or condition is identified as a potential candidate for 
treatment with a DNA-binding therapeutic, specific genes or other DNA sequences 
that are crucial for the expression of the disease associated gene (or survival 
of a pathogen) are identified within the biochemical or physiological pathway 
(or the pathogen) . In humans, many genes involved in important biological 
functions have been identified. Virtually any DNA sequence is a potential 
target site for a DNA-binding molecule, including mRNA coding sequences, 
promoter sequences, origins of replication, and structural sequences, such as 
telomeres and centromeres. One class of sites that may be preferable are the 
recognition sequences for proteins that are involved in the regulation or 
expression of genetic material. For this reason, the promoter/regulatory 
regions of genes also provide potential target sites (Table III, see also 
Example 15) . 



Table III: Human Genes with Promoter Regions that 
are Potential Targets for DNA-Binding Molecules 

* [LOCUS Names are from EMBL database ver. 33. 1992.] 


LOCUS Names* 


Locus Description 


>HS5FDX 


Human ferredoxin gene, 5' end. 


>HSA1-ATCA 


-Human -macrophage alphal-antitrypsin. cap .site _ 
region 


>HSA1GPB1 


Human gene B for alpha 1-acid glycoprotein 
exon 1 and 5 'flank 


>HSA1MBG1 


Human gene for alpha-l-micro-globu- 
lin-bikunin, exons 1-5 (encoding 


>HSA2MGLB1 


H. sapiens gene for alpha-2 macroglobulin, 
exon 1 


>HSACAA1 


H. sapiens ACAA gene {exons 1 & 2) for 
jperoxisomal 3-oxoacyl-CoA 


>HSACCOA 


Homo sapiens choline acetyltransf erase gene 
sequence. 


>HSACEB 


Human angiotensin I-converting enzyme (ACE) 
gene, 5' flank. 


>HSACHG1 


Human gene fragment for the acetylcholine 
receptor gamma subunit 


>HSACT2CK1 


Human cytokine (Act-2) gene, exon 1. 


>HSACTBPR 


Human beta-actin gene 5' -flanking region 


>HSACTCA 


Human cardiac actin gene, 5' flank. 


>HSACTSA 


Human gene for vascular smooth muscle 
alpha-actin (ACTS A) , 5' 


>HSACTSG1 


Human enteric smooth muscle gamma-actin gene, 
exon 1 . 


>HSAD1 2L 


Human arachidonate 12-lipoxygenase gene, 5' 
end . 
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>HSADH1X 


Human alcohol dehydrogenase alpha subunit 
(ADH1) gene, exon 1. 


>HSADH2X 


Human alcohol dehydrogenase beta subunit 
(ADH2) gene, exon 1. 


>HSAFPCP 


Human alpha-f etoprotein gene, complete cds . 


>HSAK1 


Human cytosolic adenylate kinase (AK1) gene, 
complete cds. 


>HSAGAL 


Human alpha-N- acetylgalactosaminidase (NAGA) 
gene, complete cds. 


>HSALADG 


H. sapiens ALAD gene for porphobilinogen 
synthase 


>HSALBENH 


Human albumin gene enhancer region. 


>HSALDA1 


Human aldolase A gene 5' non-coding exons 


>HSALDCG 


Human aldolase C gene for 
fructose-1, 6-bisphosphate aldolase 


>HSALDOA 


Human aldolase A gene {EC 4.1,2.13) 


>HSALDOBG 


Human DNA for aldolase B transcription start 
region 


>HSALIFA 


Human leukemia inhibitory factor (LIF) gene, 
complete cds. 


>HSAMINON 


Human aminopeptidase N gene, complete cds. 


>HSAMY2A1 


Human alpha-amylase (EC 3.2.1.1) gene AMY2A 
5-flank and exon 1 


>HSAMYB01 


Human amyloid-beta protein (APP) gene, exon 
1. 1154 


>HSANFG1 


Human gene fragment for pronatriodilatin 
precursor (exons 1 and 2) 


>HSANFPRE 


Human gene for atrial natriuretic factor 
(hANF) precursor 


>HSANFZ1 


Human atrial natriuretic factor gene, 
complete cds. 


>HSANGG1 


Human angiotensinogen gene 5' region and exon 
1 


>HSANT1 


Human heart/skeletal muscle ATP/ADP 
translocator (ANT1) gene, 


>HSAPC3A 


Human apolipoprotein CIII gene and apo Al-apo 
intergeni-c 


>HSAPC3G 


Human gene for apolipoprotein C-III 


>HSAPOA2 


Human gene for apolipoprotein All 


>HSAPOAIA 


Human fetal gene for apolipoprotein AI 
precursor 


>HSAPOBPRM 


Human apoB gene 5' regulatory region 
(apolipoprotein B) 


>HSAPOC2G 


Human apoC-II gene for preproapolipoprotein 
C-II 


>HSAPOCIA 


Human apolipoprotein C-I (VLDL) gene, 
complete cds. 


>HSAPOLIDG 


H. sapiens promoter region of gene for 
apolipoprotein D 


>HSARG1 


Human arginase gene exon 1 and flanking 
regions (EC 3.5.3.1) 


>HSASG5E 


Human argininosuccinate synthetase gene 5' 
end 1105 


>HSATP1A3S<- 


Human sodium/potassium ATPase alpha 3 subunit 
(ATP1 A3) gene, 5' 


>HSBSF2 


Human (BSF-2/IL6) gene for B cell stimulatory 
factor-2 


>HSC5GN 


Human C5 gene, 5* end. 650 


>HSCAII 


Human gene fragment for carbonic anhydrase II 
(exons 1 and 2) 


>HSCALCAC 


Human calcitonin/alpha-CGRP gene 



>HSCALRT1 


Human DNA for calretinin exon 1 


>HSCAPG 


Human cathepsin G gene, complete cds. 


>HSCAVII1 


H. sapiens carbonic anhydrase VII (CA VI I) 
gene, exon l t 


>HSCBMYHC 


Human gene for cardiac beta myosin heavy 
chain 


>HSCD3AA 


Human complement C3 protein mRNA, 5' flank. 
>HSCD4 Human recognition/surface antigen 
(CD4) gene, 5' end. 


>HSCD44A 


Human hyaluronate receptor (CD44) gene, exon 
1. 


>HSCFTC 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' 


>HSCH7AHYR 


Human cholesterol 7-alpha-hydroxylase (CYP7 ) 
gene, 5 ' end. 


>HSCHAT 


Human gene for choline acetyltransf erase (EC 
2.3.1.6), partial 


>HSCHYMASE 


Human mast cell chymase gene, complete cds. 


>HSCHYMB 


Human heart chymase gene, complete cds. 3279 


>HSCKBG 


Human gene for creatine kinase B (EC 2.7.3.2) 


>HSCNP 


Human C-type natriuretic peptide gene, 
complete cds. 


>HSCD59011 


Human transmembrane protein (CD59) gene, exon 
1. 


>HSCDPRO 


Human myeloid specific. CDllb promoter DNA. 


>HSCETP1 


Human cholesteryl ester transfer protein. 
(CETP) gene, exons 1 and 


>HSCFTC 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' 


>HSCOSEG 


H. sapiens coseg gene for vasopres- 
sin-neurophysin precursor 


>HSCREKIN 


Human creatine kinase gene, exon 1. 


>HSCRYABA 


Human alpha-B-crystallin gene, 5' end. 


>HSCS5P 


Human C3 gene, 5 ? end. 


>HSCSF1G1 


Human gene for colony stimulating factor 
CSF-1 5* region 


>HSCSPA 


Human cytotoxic serine proteinase gene, 
complete cds. 


>HSCST3G 


Human CST3 gene for cystatin C 


>HSCST4 


H. sapiens CST4 gene for Cystatin D 


>HSCYP2C8 


Human CYP2C8 gene for cytochrome P-4 50, 5' 
flank and exon 1 


>HSCYP45A 


Human gene for cholesterol desmolase 
cytochrome P-450 (SCC) exon 1 


>HSCYPB1 


Human steroid 11-beta-hydroxylase (CYP11B1) 
gene, exons 1 and 2. 


>HSCYPXI 


Human CYPXI gene for steroid 18-hydroxylase 
{P-450 C18) . 2114 


>HSCYPXIB1 


Human CYPXIB gene for steroid llbe- 
ta-hydroxylase (P-450 llbeta), 


>HSCYPXIX 


Human CYPXIX gene, exon 1 coding for 
aromatase P-450 (EC 1.14.14.1) 


>HSDAFC1 


Human decay-accelerating factor (DAF) gene, 
exons 1 and 2 . 


>HSDBH1 


Human DNA for dopamine beta-hydroxylase exon 
1 (EC 1.14.17.1) 


>HSDES 


Human desmin gene, complete cds. 


>HSDKERB 


Human cytokeratin 8 (CK8) gene, complete cds. 


>HSDNAPOL 


Human DNA polymerase alpha gene, 5' end. 
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>HSDOPAM 


H . sapiens dopamine D1A receptor gene, 
complete exon 1, and exon 2, 


>HSECP1 


Human DNA for eosinophil cationic protein ECP 


>HSEGFA1 


Human HER2 gene, promoter region and exon 1. 


>HSEL20 


Human elastin gene, exon 1. 


>HSELAM1B 


Human endothelial leukocyte adhesion molecule 
I {ELAM-1 ) gene, 


>h c ;fmbpa 


Human eosinophil major basic protein gene, 
complete cds . 


>HSFNKB1 


Human preproenkephalin B gene 5' region and 
exon 1 




Human EN03 gene 5* end for muscle-specific 
enolase 




Human DNA for eosinophil derived neurotoxin 




Human erythropoietin receptor mRNA sequence 
derived from DNA, 5' 


>HSERB2P 


Human c-erb B2/neu protein gene, 5' end, and 
promoter region. 


>HSERCC25 


Human genomic and mRNA sequence for ERCC2 
gene 5 'region involved in 


>HSERPA 


Human erythropoietin qene, complete cds. 


>HSERR 


Human mRNA for oestrogen receptor 


>HSESTEIl 


H . sapiens exon 1 for elastase I 


>HSFBRGG 


Human gene for fibrinogen gamma chain 


>HSFCERG5 


Human lymphocyte IgE receptor gene 5' -region 
(Fc-epsilon R) 


>HSFERG1 


Human apoferritin H gene exon 1 


SU C TTT HUB 1 


Human f ibrinoaen beta aene 5 ' region and exon 

1 




Human factor IX aene. complete cds. 


/nit i\Dr x 


Human FK506 binding proteins 12A, 12B and 12C 
(FKBP12) mRNA, exons 




Human 5-lipoxygenase activating protein 
(FLAP) gene, exon 1. 


>HSFOS 


Human fos proto-oncogene (c-fos), complete 
cds . 




Human G0S2 gene, upstream region and cds. 


>HSGCSFG 


' Human -gene -for granulocyte- colony- st-imul-a ting- - 
factor (G-CSF) 


>HSGEGR2 


Human EGR2 gene 5' region 1233 


suc^u PROM 


Human growth hormone (hGH) gene promoter 


>HSGIPX1 


Human gastric inhibitory polypeptide (GIP) 
mRNA. exon 1 . 


>HSGLA 


Human GLA gene for alpha-D-galactosidase A 
(EC 3.2. 1.22) 


>HSGLUC1 


Human glucagon gene transcription start 
region 732 


>HSGMCSFG 


Human gene for granulocyte-macrophage colony 
stimulating factor 




Human glucocorticoid receptor gene, exon 1. 
1602 




Human growth hormone-releasing factor (GRF) 
gene, exon 1 (complete) 


>HSGSTP15 


Human GST pi gene for glutathione 
S-transferase pi exon 1 to 5 


>HSGTRH 


Human gene for gonadotropin-releasing hormone 


>HSGYPC 


Human glycophorin C (GPC) gene, exon 1, and 
promoter region. 


>HSH10 


Human histone (H10) gene, 5' flank. 


>HSH1DNA 


Human gene for HI RNA 1057 


>HSH1FNC1 


Human HI histone gene FNC16 promoter region 



>HSH2B2H2 


Human H2B.2 and H2A. 1 genes for Histone H2A 
and H2B 


>HSH4AHIS 


H. sapiens H4/a gene for H4 histone 


>HSH4BHIS 


H. sapiens H4/b gene for H4 histone 


>HSHARA 


Human androgen receptor gene, transcription 
initiation sites. 


>HSHCG5B1 


Human chorionic gonadotropin (hCG) beta 
subunit gene 5 5 '-flank 


>HSHEMPRO 


Human DNA for hemopoxin promoter 


>HSHIAPPA 


Human islet amyloid polypeptide (hIAPP) gene, 
complete cds . 


>HSHIH4 


Human H4 histone gene 


>HSHISH2A 


Human histone H2a gene 


>HSHISH2B 


Human histone H2b gene 


>HSHISH3 


Human histone H3 gene 


>HSHLAA1 


Human HLA-A1 gene 


>HSHLAB27 


Human gene for HLA-B27 antigen 


| X Oil l ifii— 111 


Human HLA-Bw57 gene 


>HSHLAF 


Human HLA-F gene for human leukocyte antigen 
F 




Human gene for histocompatibility antigen 
HLA-A3 


>HSHLIC 


Human gene for class I histocompatibility 
antigen HLA-CW3 


>HSHMG17G 


Human HMG-17 gene for non-histone chromosomal 
protein HMG-17 


>HSH0X3D 


Human H0X3D gene for homeoprotein H0X3D 


^ n o no i \j 


Human hsc70 gene for 71 kd heat shock cognate 
protein 


•>uc;HCp7nn 


Human heat shock protein (hsp 70) gene, 
complete cds . 


>ucucp7np 

^ n o iiij l i \j u 


Human hsp7 0B gene 5' -region 




Human IAPP gene exon 1 and exon 2 for islet 
amyloid polypeptide 


>HSICAMAB 


Human intercellular adhesion molecule 1 
(ICAM-1) gene, exon 1. 


>HSIFI54 


Human interferon-inducible gene IFI-54K 
5' flank " ' 


>HSIFNA14 


Human interferon alpha gene IFN-alpha 14 


>HSIFNA16 


Human interferon alpha gene IFN-alpha 16 


>HSIFNA5 


Human interferon alpha gene IFN-alpha 5 


>HSIFNA6 


Human interferon alpha ge.ne IFN-alpha 6 


>HSIFNA7 


Human interferon alpha gene IFN-alpha 7 


>HSIFNG 


Human immune interferon (IFN-gamma) gene. 


>HSIFNIN6 


Human alpha/beta-interf eron (I FN) -inducible 
6-16 gene exon 1 and 


>HSIGF24B 


Human DNA for insulin-like growth factor II 
(IGF-2) ; exon 4B 


>HSIGFBP1A 


Human insulin-like growth factor binding 
protein (hIGFBPl) gene 


>HSIGK10 


Human germline gene for the leader peptide 
and variable region 


>HSIGK15 


Human germline gene for the leader peptide 
and variable region 


>HSIGK17 


Human rearranged gene for kappa im- 
munoglobulin subgroup V kappa IV 


>HSIGK20 


Human rearranged DNA for kappa immunoglobulin 
subgroup V kappa III 


>HSIGKLC1 


Human germline fragment for immunoglobulin 
kappa light chain 


>HSIGVA5 


Human germline immunoglobulin kappa light 
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chain V-segment 


>HSIL05 


Human interleukin-2 (IL-2) gene and 
5* -flanking region 


>HSIL1AG 


Human gene for interleukin 1 alpha {IL-1 
alpha) 


>HSIL1B 


Human gene for prointerleukin 1 beta 


>HSIL2RG1 


Human interleukin 2 receptor gene 5' flanking 
region and exon 1 


>HSIL45 


Human interleukin 4 gene 5' -region 


>HSIL5 


Human interleukin 5 (IL-5) gene, complete 
cds . 


>HSIL6B 


Human interleukin 6 {IL 6) gene, 5' flank. 


>HSIL71 


Human interleukin 7 (IL7) gene, exon 1. 


>HSIL9A 


Human IL9 protein gene, complete cds. 


>HSINSU 


Human gene for preproinsulin, from chromosome 
11. Includes a highly- 


>HSINT1G 


Human int-1 mammary oncogene 


>HSJUNCAA 


Human jun-B gene, complete cds. 


>HSKER65A 


Human DNA for 65 kD keratin type II exon 1 
and 5' flank 


>HSKERUHS 


Human gene for ultra high-sulphur keratin 
protein 


>HSLACTG 


Human alpha-lactalbumin gene 


>HSLAG1G 


Human LAG- 1 gene 


>HSLCATG 


Human gene for lecithin-cholesterol 
acyltransferase { LCAT ) 


>HSLCK1 


Human lymphocyte-specific protein tyrosine 
kinase (lck) gene 


>HSLFACD 


Human leukocyte function-associated antigen-1 
(LFA-1 or CDlla) 


>HSLPLA 


Human lipoprotein lipase (LPL) gene, 5 1 
flank. 


>HSLYAM01 


Human leukocyte adhesion molecule-1 (LAM-1) , 
exon 1 . 


>HSLYSOZY 


Human lysozyme gene (EC 3.2.1.17) 


>HSMBP1A 


Human DNA for mannose binding protein 1 
(MBP1), Exon 1 


->HSMCCPAA 


Human -mast-cell- carboxypeptidase A__(MC-CPA)_ _ . 
gene, exons 1-2. 


>HSMDR1 


Human P-glycoprotein (MDR1) mRNA, complete 
cds . 


>HSMED 


Human bone marrow serine protease gene 
(medullasin) 


>HSMEHG 


Human DNA (exon 1) for microsomal epoxide 
hydrolase 


>HSMETIE 


Human metallothionein-Ie gene (hMT-Ie) . 


>HSMG01 


Human myoglobin gene (exon 1) 


>HSMGSAG 


Human gene for melanoma growth stimulatory 
activity (MGSA) 


>HSMHCAG1 


Human alpha-MHC gene for myosin heavy chain 
N-terminus) 


>HSMHCGE1 


Human class II invariant gamma-chain gene. (5' 
flank, exon 1) 


>HSMHCW5 


Human MHC class I HLA-Cw5 gene, 5' flank. 


>HSMLN1 


Human motilin gene exon 1 


>HSMPOA 


Human myeloperoxidase gene, exons 1-4. 


>HSMRP 


Human mitochondrial RNA-processing 
endoribonuclease RNA (mrp) gene 


>HSMTS1A 


H. sapiens mtsl gene, 5* end. 


>HSMYCE12 


Human myc-oncogene exon 1 and exon 2 


>HSNAKATP 


Human Na , K-ATPase beta subunit (ATP1B) gene, 
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exons 1 and 2. 


>HSNEURK1 


H. sapiens gene for neuromedin K receptor 
(exon 1) 


>HSNFH1 


Human gene for heavy neurofilament subunit 
(NF-H) exon 1 


>HSNFIL6 


Human gene for nuclear factor NF-IL6 


>HSNFLG 


Human gene for neurofilament subunit NF-L 


>HSNK21 


Human neurokinin-2 receptor (NK-2) gene, exon 
1. 


>HSNMYC 


Human germ line n-myc gene 


>HSNRASPR 


H. sapiens N-RAS promoter region 


>HS0DC1A 


Human ornithine decarboxylase (0DC1) gene, 
complete cds . 


>HS0TCEX1 


Human ornithine transcarbamylase (OTC) gene, 
5* -end region. 


>HSOTNPI 


Human prepro-oxytocin-neurophysin I gene, 
complete cds. 


>HSP4 50SCC 


Human cytochrome P4 50scc gene, 5' end and 
promoter region. 


>HSP53G 


Human p53 gene for transformation related 
protein p53 


>HSPADP 


Human promoter DNA for Alzheimer's disease 
amyloid A4 precursor 


>HSPAI11 


Human gene for plasminogen activator 
inhibitor 1 (PAI-1) 5' -flank 


>HSPGDF 


Human platelet-derived growth factor A-chain 
(PDGF) gene, 5 r end 


>HSPGP95G 


Human PGP9.5 gene for neuron-specific 
ubiquitin C-terminal 


>HSPLSM 


Human plasminogen gene, exon 1. 


>HSPNMTB 


Human gene for phenylethanolamine N-methylase 
(PNMT) (EC 2.1.1.28) 


>HSP0MC5F 


Human opiomelanocortin gene, 5' flank. 


>HSPP14B 


Human placental protein 14 (PP14) gene, 
complete cds . 


>HSPRB3L 


Human gene PRB3L for proline-rich protein Gl 


>HSPRB4S 


Human PRB4 gene for proline-rich protein Po, 
allele. S 


>HSPRLNC 


Human prolactin mRNA, partial cds. 


>HSPR0AA1 


Human prothymosin-alpha gene, complete cds. 


>HSPR0T2 


Human protamine 2 gene, complete cds. 


>HSPRPE1 


Human SPR2-1 gene for small proline rich 
protein (exon 1) 


>HSPS2G1 


Human estrogen-responsive gene pS2 5' flank 
and exon 1 


>HSPSAP 


Human pulmonary surfactant apoprotein (PSAP) 
gene, complete cds. 


>HSPSP94A 


Human gene for prostatic secretory protein 
PSP-94, exon 1 


>HSPTHRPA 


Human parathyroid hormone-related peptide 
(PTHRP) gene, exons 1A, 


>HSPURNPHO 


Human gene for purine nucleoside 
phosphorylase {upstream region) 


>HSRDNA 


Human rDNA origin of transcription 


>HSREGA01 


Human regenerating protein (reg) gene, 
complete cds. 


>HSREN01 


Human renin gene 5' region and exon 1 


>HSRPBG1 


Human gene fragment for retinol binding 
protein (RBP) (exon 1-4) 


>HSSAA1A 


Human serum amyloid A {GSAA1 ) gene, complete 
cds . 
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>HSSAA1B 


H. sapiens SAA1 beta gene 


>HSSB4B1 


Human gene fragment for HLA class II SB 
4-beta chain (exon 1) 


>HSSISG5 


Human c-sis proto-oncogene 5' region 


>HSSLIPG 


Human SLPI gene for secretory leukocyte 
protease inhibitor 


>HSS0D1G1 


Human superoxide dismutase (SOD-1) gene exon 
1 and 5' flanking 


>HSSODB 


Human ornithine decarboxylase gene, complete 
cds . 


>HSSRDA01 


H. sapiens steroid 5-alpha-reductase gene, 
exon 1 . 


>HSSUBP1G 


H. sapiens gene for substance P receptor (exon 
1) 


>HSSYB1A1 


Human synaptobrevin 1 (SYB1) gene, exon 1. 


>HSTAT1 


Human gene for tyrosine aminotransferase 
(TAT) (EC 2.6.1.5) Exon 1. 


>HSTCBV81 


Human T-cell receptor V-beta 8.1 gene 775 


>HSTCRB21 


Human T-cell receptor beta chain gene 
variable region. 


>HSTFG5 


Human transferrin (Tf) gene 5* region 


>HSIL3FL5 


Human interleukin 3 gene, 5' flank. 


>HSTFPB 


Human tissue factor gene, complete cds. 


>HSTGFB1 


Human mRNA for transforming growth 
factor-beta (TGF-beta) 


>HSTGFB3B 


Human transforming growth factor beta-3 gene, 
5* end. 


>HSTGFBET2 


Human transforming growth factor beta-2 gene, 
5* end. 


>HSTH01 


Human tyrosine hydroxylase (TH) (EC 
1.14.16.2) gene from upstream 


>HSTHI02A 


Human metallothionein gene IIA promoter 
region 


>HSTHR001 


Human thrombospondin gene, exons 1, 2 and 3. 


>HSTHXBG 


H. sapiens gene for thyroxine-binding globulin 
gene 


>HSTHYR5 


Human thyroglobulin gene 5' region 


>HSTNFA 


Human gene for tumor necrosis- factor ■ - - - 
(TNF-alpha) 


>HSTNFB 


Human gene for lymphotoxin (TNF-beta) 


>HSTOP01 


Homo sapiens type I DNA topoisomerase gene, 
exons 1 and 2 . 


>HSTPIA 


Human triosephosphate isomerase (TPI) gene, 
5' end. 


>HSTP05 


Human thyroid peroxidase gene 5* end (EC 
1.11.1.7) 


>HSTRP 


Human transferrin receptor gene promoter 
region 


>HSTRPY1B 


Human tryptase-I gene, complete cds. 


>HSTUBB2 


Human beta 2 gene for beta-tubulin 


>HSTYR01E 


Human tyrosinase gene, exon 1 and 5' flanking 
region (EC 1.14.18.1) 


>HSU6RNA 


Human gene for U 6 RNA 


>HSUPA 


Human uPA gene for urokinase-plasminogen 
activator 


>HSVAVP01 


Human proto-oncogene vav, 5' end. 


>HSVCAM1A 


Human vascular cell adhesion molecule-1 
(VCAM1) gene, complete CDS. 


>HSVIM5RR 


Human vimentin gene 5 • regulatory region 



Once the gene target or, in the case of small pathogens, the genome 
target has been identified, short sequences within the gene or genome target 
are identified as medically significant target sites. Medically significant 
target sites can be defined as short DNA sequences {approximately 4-30 base 
pairs) that are required for the expression or replication of genetic material. 

For example, sequences that bind regulatory factors, either transcriptional or 
replicatory factors, are ideal target sites for altering gene or viral 
expression. 

Further, coding sequences may be adequate target sites for disrupting 
gene function, although the disruption of a polymerase complex that is moving 
along the DNA sequence may require a stronger binder than for the disruption of 
the initial binding of a regulatory protein. 

Finally, even non-coding, non-regulatory sequences may be of interest as 
target sites [e.g., for disrupting replication processes or introducing an 
increased mutational frequency) . 

Several specific examples of medically significant target sites are shown 
in Table IV. 



Table IV 

MEDICALLY SIGNIFICANT DNA-BINDING SEQUENCES 



Test sequence 


DNA-binding Protein. 


Medical Significance. . 


EBV origin of 
replication 


EBNA 


Infectious mononu- 
cleosis, nasal 
pharyngeal carcinoma 


HSV origin of 
replication 


UL9 


Oral and genital 
Herpes 


VZV origin of 
replication 


UL9-like 


Shingles 


HPV origin of 
replication 


E2 


Genital warts, 
cervical carcinoma 


Interleukin 2 
enhancer 


NFAT-1 


Immunosuppressant 


HIV LTR 


NFAT-1 
NFkB 


AIDS, ARC 


HBV enhancer 


HNF-1 


Hepatitis 


Fibrogen 
promoter 


HNF-1 


Cardiovascular 
disease 


Oncogene 
promoter and' 
coding 
sequences 


?? 


cancer 



(Abbreviations: EBV, Epstein-Barr virus; EBNA, Epstein-Barr virus 
nuclear antigen; HSV, Herpes Simplex virus; VZV, Varicella zoster virus; 
HPV, human papilloma virus; HIV LTR, Human immunodeficiency virus long terminal 
repeat; NFAT, nuclear factor of activated T cells; NFkB, nuclear factor 
5 kappaB; AIDS acquired immune deficiency syndrome; ARC, AIDS related complex; 
HBV, hepatitis B virus; HNF, hepatic nuclear factor.) 

For example, origin of replication binding proteins have short, well- 
defined binding sites within the viral genome and are therefore excellent 
target sites for a competitive DNA-binding drug. Examples of such proteins 
10 include, Epstein Barr virus nuclear antigen 1 (EBNA-1) (Ambinder, et al.; 
Reisman, et al.), E2 (which is encoded by the human papilloma virus) (Chin, et 
ai.), UL9 (which is encoded by herpes simplex virus type 1) (McGeoch, et al.) f 
and the homologous protein in varicella zoster virus (VZV) {Stow, et al.). 

Similarly, recognition sequences for DNA-binding proteins that act as 
15 transcriptional regulatory factors are also good target sites for antiviral 
DNA-binding drugs. Examples listed in Table IV include (i) the binding site 
for hepatic nuclear factor (HNF-1), which is required for the expression of 
human hepatitis B virus (HBV) (Chang), and (ii) NFkb and NFAT-1 binding sites 
in the human immunodeficiency virus (HIV) long terminal repeat (LTR) , one or 
20 both of which may be involved in the expression of the virus (Greene, W.C.). 

Examples of non-viral DNA targets for DNA-binding drugs are also shown in 
Table IV to illustrate the wide range of potential applications for sequence- 
specific DNA-binding molecules. For example, nuclear factor of activated T 
cells (NFAT-1) is a regulatory factor that is crucial to the inducible 
25 expression of the interleukin 2 gene in response to signals from the antigen 
receptor, which, in turn, is required for the cascade of molecular events 
O during T cell activation (for review, see Edwards, C.A., and Crabtree, G.R.). 

~£t The mechanism of action of two immunosuppressants, cyclosporin A and FK506, is 

'% thought to be to block the inducible expression of NFAT-1 (Schmidt, et al. and 

^ 30 Banerji, et al.) . However, the effects of these drugs are not specific to 
$3 NFAT-1; therefore, a drug targeted specifically to the NFAT-1 binding site m 

M the IL-2 enhancer- would be desirable as an improved immunosuppressant. 

: rw Targeting the DNA site with a DNA-binding drug rather than targeting with 

a drug that affects the DNA-binding protein (presumably the target of the 
35 current immunosuppressants) is advantageous for at least two reasons: first, 
y; there are many fewer target sites for specific DNA sequences than specific 

proteins (e.g., in the case of glucocorticoid receptor, a handful of DNA- 
B binding sites vs. about 50,000 protein molecules in each cell); and second, 

only the targeted gene need be affe cted by a DNA-binding drug, while a protein-^ 
" binding " drug would disable all the cellular functions of the protein. An 
example of the latter point is the binding site for HNF-1 in the human 
. - fibrinogen promoter. Fibrinogen level is one of the most highly correlated 

^ factor with cardiovascular disease. A drug targeted to either HNF-1 or the 

□ ■ HNF-1 binding site in the fibrinogen promoter . might be used to decrease 
45 fibrinogen expression in individuals at high risk for disease because of the 
r " over-expression of fibrinogen. However, since HNF-1 is required for the 

expression of a number of normal hepatic genes, blocking the HNF-1 protein 
would be toxic to liver function. In contrast, by blocking a DNA sequence that 
is composed in part of the HNF-1 binding site and in part by flanking sequences 
50 for divergence, the fibrinogen gene can be targeted with a high level of 
selectivity, without harm to normal cellular HNF-1 functions. 

The assay has been designed to screen virtually any DNA sequence. Test 
sequences of medical significance include viral or microbial pathogen genomic 
sequences and sequences within or regulating the expression of oncogenes or 
55 other inappropriately expressed cellular genes. In addition to the detection 
of potential antiviral drugs, the assay of the present invention is also 
applicable to the detection of potential drugs for (i) disrupting the 
metabolism of other infectious agents, (ii) blocking or reducing the 
transcription of inappropriately expressed cellular genes (such as oncogenes or 
60 genes associated with certain genetic disorders), and (iii) the enhancement or 
alteration of expression of certain cellular genes. 
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2 . Defined Sets of Test Sequences . 
The approach described in the above section emphasizes screening large 
numbers of fermentation broths, extracts, or other mixtures of unknowns against 
specific medically significant DNA target sequences. The assay can also be 
5 utilized to screen a large number of DNA sequences against known DNA-binding 
drugs to determine the relative affinity of the single drug for every possible 
defined specific sequence. For example, there are 4 n possible sequences, where 
n = the number of nucleotides in the sequence. Thus, there are 4 3 = 64 
different three base pair sequences, 4 4 = 256 different four base pair se- 
10 quences, 4 5 = 1024 different 5 base pair sequences, etc. If these sequences 
are placed in the test site, the site adjacent to the screening sequence (the 
example used in this invention is the UL9 binding site) , then each of the 
different test sequences can be screened against many different DNA-binding 
molecules . 

15 The test sequences may be placed on either or both sides of the screening 

sequence, and the sequences flanking the other side of the test sequences are 
fixed sequences to stabilize the duplex and, on the 3' end of the top strand, 
to act as an annealing site for the primer (see Example 1) . In Figure 14B, the 
TEST and SCREENING sequences are indicated. The preparation of such double- 

20 stranded oligonucleotides is described in Example 1 and illustrated in Figure 
4 . 

The test sequences, denoted in Figure 14B as X:Y (where X - A,C,G, or T 
and Y = the complementary sequence, T,G,C, or A), may be any of the 256 
different 4 base pair sequences shown in Figure 13. 
25 Once a set of test oligonucleotides containing all possible four base 

pair sequences has been synthesized {see Example 1), the set can be screened 
with any DNA-binding drug. The relative effect of the drug on each oligonucle- 
42 otide assay system will reflect the relative affinity of the drug for the test 

,ipt sequence. The entire spectrum of affinities for each particular DNA sequence 

^ 30 can therefore be defined for any particular DNA-binding drug. This data, 
'f* generated using the assay of the present invention, can be used to facilitate 

molecular modeling programs and/or be used directly to design new DNA-binding 
molecules with increased affinity and specificity. 

Another type of ordered set of oligonucleotides that may be useful for 
2* 35 screening are sets comprised of scrambled sequences with fixed base 
Qj composition. For example, if the recognition sequence for a protein is 5'- 

GATC-3' and libraries were to be screened for DNA-binding molecules that 
recognized this sequence, then it would be desirable to screen sequences of 
similar size and base composition as control sequences for the assay. The most 
p x ec l se ~e xpe r intent ~ i~s "one ~i n ~ whTc TT a IT "~p o ssTb'l e~ ~4 bp ~s equ en ces "are ~~s ere eTTedT™ ' 
In the case of a 4 base-pair sequence, this represents 4 = 256 different test 
sequences: a number of screening sequences that may not be practical in every 
situation. However, there are many fewer possible 4 bp sequences with the same 
base composition {1G, 1A, IT, 1C) (n! 
M 4 5 = 24 different 4 bp sequences with this particular base composition), such se- 
quences provide excellent controls without having to screen large numbers of 
sequences . 

* 3 . Theoretical Considerations in Choosing Biological Target 

50 Sites: Specificity and Toxicity . 

One consideration in choosing sequences to screen using the assay 
of the present invention is test sequence accessibility, that is, the potential 
exposure of the sequence in vivo to binding molecules. Cellular DNA is 
packaged in chromatin, rendering most sequences relatively inaccessible. Se- 

55 quences that are actively transcribed, particularly those sequences that are 
regulatory in nature, are less protected and more accessible to both proteins' 
and small molecules. This observation is substantiated by a large literature 
on DNAase I sensitivity, footprinting studies with nucleases and' small mole- 
cules, and general studies on chromatin structure (Tullius) . The relative 

60 accessibility of a regulatory sequence, as determined by DNAase I 
hypersensitivity, is likely to be several orders of magnitude greater than an 
inactive portion of the cellular genome. For this reason the regulatory se- 
quences of cellular genes, as well as viral regulatory or replication se- 
quences, are useful regions to choose for selecting specific inhibitory small 

65 molecules using the assay of the present invention. 
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Another consideration in choosing sequences to be screened using the 
assay of the present invention is the uniqueness of the potential test se- 
quence. As discussed above for the nuclear protein HNF-1 , it is desirable that 
small inhibitory molecules are specific to their target with minimal cross 
5 reactivity. Both sequence composition and length effect sequence uniqueness. 
Further, certain sequences are found less frequently in the human genome than 
in the genomes of other organisms, for example, mammalian viruses. Because of 
base composition and codon utilization differences, viral sequences are 
distinctly different from mammalian sequences. As one example, the dinucle- 
10 otide CG is found much less frequently in mammalian cells than the dinucleotide 
sequence GC: further, in SV40, a mammalian virus, the sequences AGCT and ACGT 
are represented 34 and 0 times, respectively. Specific viral regulatory se- 
quences can be chosen as test sequences keeping this bias in mind. Small 
inhibitory molecules identified which bind to such test sequences will be less 
15 likely to interfere with cellular functions. 

There are approximately 3 x 10 9 base pairs (bp) in the human genome. Of 
the known DNA-binding drugs for which there is crystallographic data, most bind 
2-5 bp sequences. There are 4 4 = 256 different 4 base sequences; therefore, 
on average, a single 4 bp site is found roughly 1.2 x 10 7 times in the human 
20 genome. An individual 8 base site would be found, on average, about 50,000 
times in the genome. On the surface, it might appear that drugs targeted at 
even an 8 bp site might be deleterious to the cell because there are so many 
binding sites; however, several other considerations must be recognized. 

First, most DNA is tightly wrapped in chromosomal proteins and is 
25 relatively inaccessible to incoming DNA-binding molecules as demonstrated by 
^ the nonspecific endonucleolytic digestion of chromatin in the nucleus {Edwards, 

C.A. and Firtel, R.A.). Active transcription units are more accessible, but 
ijj the most highly exposed regions of DNA in chromatin are the sites that bind 

b fi regulatory factors. As demonstrated by DNAase I hypersensitivity (Gross, D.S.' 

^ 30 and Garrard, W.T.), regulatory sites may be 100-1000 times more sensitive to 
endonucleolytic attack than the bulk of chromatin. This is one reason for 
LJ targeting regulatory sequences with DNA-binding drugs. 

lYi Secondly, several anticancer drugs that bind 2, 3, or 4 bp sequences have 

~~Z sufficiently low toxicity so that they can be used as drugs. This indicates 

4™ 35 that, if high affinity binding sites for known drugs can be matched with 
Ql specific viral target sequences, it may be possible to use currently available 

drugs as antiviral agents at lower concentrations than they are currently used, 
■ , with a concomitantly lower toxicity. 

H~ 40 - 4 . " "Further Considerations in- Choosing Target- Sites-; Finding 

M Eukaryotic Promoters . 

Eukaryotic organisms have three RNA polymerases (Pol I, II, and III) that 
transcribe genetic information from DNA to RNA. The correct regulation of this 
information flow is essential for the survival of the cell. These multi- 
fsi. 45 subunit enzymes need additional proteins to re.gulate transcription. Many of 
these additional proteins bind to DNA in a region 5' of the translation start 
site for a gene: this region is generally known as the promoter region of the 
gene . 

All three polymerases use a core set of general transcription proteins to 
50 bind to this region. A central component of this complex is the protein called 
TBP or TFIID. The site this protein binds to is known as the TATA-box because 
the sequence usually contains a sequence motif similar to TATA (e.g., 
TATAa/tAa/t) . Originally it was thought that each of the three polymerases 
used a separate set of general transcription factors and that Pol II used TFIID 
55 exclusively. Recently it has been shown that all three classes of RNA 
polymerase need TFIID for transcriptional regulation (see" Comai, et al . ; and 
Greenblatt) 

A molecule that binds to a DNA sequence closely adjacent or overlapping a 
TATA binding site will likely alter transcriptional regulation of the gene. If 
60 the molecule binds based solely on specificity to the TATA-box sequence itself, 
then this molecule is expected to be very toxic to cells since the 
transcription of most genes would be altered. The sequences adjacent to TATA 
boxes, however, are not conserved. Accordingly, if a particular sequence is 
selected adjacent a TATA box of a particular gene, a molecule that binds to 
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this specific sequence would be expected to alter the transcriptional 
regulation of just that gene. 

TATA- boxes were first identified by determining the sequence of the DNA 
located 5' of the RNA start sites of a number of genes. Examination of these 
5 sequences revealed that most genes had a TATA-box motif (consensus sequence) in 
the range of nucleotides 50 to 15 nucleotides 5' of the RNA start site. In 
vitro studies, typically DNA protection (footprinting) studies, lead to the 
conclusion that proteins were binding to these sites. Further in vitro DNA 
binding experiments demonstrated that some proteins could specifically bind to 
10 these sites. This lead to assays that allowed purification and subsequent 
sequencing of the binding proteins. This information facilitated the cloning 
and expression of genes encoding the binding proteins. A large number of 
transcription factors are now known. The protein designated TFIID has been 
demonstrated to bind to the TATA-box (Lee, et al.)- 
15 Molecules that interfere with the interaction of these transcription 

factors and their target DNA (i.e., DNA/Protein transcription complexes) are 
also expected to alter transcription initiated from the target DNA. A publicly 
available database of these factors and the sequences to which they bind is 
available from the National Library of Medicine and is called "The 
20 Transcription Data Base, or TFD." The binding sites of these transcription 
factors can be identified in the 5 1 non-coding region of genes having known se- 
quences (Example 15) . 

The ability to select target sequences adjacent the binding site of a 
transcription factor, as described above for TFIID, can be applied to other 
25 general transcription factors as well. For the purpose of the present 
invention, a general transcription factor is one that regulates the 
transcriptional expression of more than one t gene. For any such general 
transcription factor, as for TFIID above, a particular target sequence can be 
selected adjacent the transcription factor binding site of a selected gene. A 
molecule that binds to this specific target sequence would be expected to alter 
ijj the transcriptional regulation of just that gene and not all of the genes for 

which the transcription factor regulates expression. Alteration of 

transcriptional regulation may involve inhibition or increased affinity 
(enhancement) of binding of a transcription factor to its cognate DNA. 
45 35 Many examples of such general transcription factors have been identified, 

m including, but not limited to, the following: SP1 (Raney, et al., 1992; 

Kitadai, et al. t 1992); NFAT-1 (Shaw, et al., 1988); Ets family of transcrip- 
3 tion factors, including Elfl (Thompson, et al., 1992); Fos protein (Neuberg, et 

¥° al., 1991); NF-kappa (Wirth, et al., 1988; Meijer, et al . , 1992); and API-like 

irr- 40 — proteins r "including "th'e~"product— of "the" c-jun~ oncogene -( Descheemaeker 7 -et— ai— ,- 
i lS 1992; Ryder et al., 1988; Harshman et al . , 1988; Angel et al., 1988; Bos et 

f! al., 1988; Bohmann et al . , 1987). 

yj Accordingly, for a selected gene, non-conserved DNA surrounding the 

O transcription factor binding site can be chosen as a specific target sequence 

l1 45 for small molecule binding. A small molecule can be chosen whose binding 
- = ~ overlaps an adjacent transcription factor DNA binding sequence (e.g., by 1-3 

nucleotide pairs) . In this case, the specificity of DNA binding for the small 
molecule is, in large part, derived from the non-conserved sequences adjacent 
the transcription factor binding site, in order to reduce small molecule 
50 binding at the transcription factor binding site associated with other genes. 

Small molecules that bind such specific target sequences can be 
identified and/or designed using the assay and methods of the present 
invention. 

55 5 . Further Considerations in Choosing Alternative Small- 

Molecule-Bindinq Target Sites. 

Small molecules that interfere with the interaction of any DNA binding 
protein and its cognate DNA (i.e., DNA/Protein complexes) can be selected by 
the assay and methods of the present invention. As described above for general 

60 transcription factors, sequences adjacent the DNA binding site for a selected 
DNA binding protein can serve as a target for small molecule binding in order 
to alter the interaction of the DNA binding protein and its cognate site. The 
small molecule can affect the DNA: protein interaction, for example, by 
inhibiting or enhancing the association of protein with the DNA. 
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For a selected DNA: protein interaction, non-conserved DNA surrounding the 
selected DNA binding site can be chosen as a specific target sequence for small 
molecule binding. In some cases the small molecule binding can overlap the DNA 
binding site: for example, in the case of a therapeutic used to treat a mammal 
5 with a bacterial infection, a small molecule may be selected to bind to the 
bacterial origin of DNA replication. Such a small molecule may essentially 
completely overlap the region defined by the bacterial origin-of-replication- 
DNA: protein interaction since a corresponding target sequence is not likely 
present in the DNA of the mammalian host. 
10 However, in the case where selective binding is required, as described 

above for TFIID, the specificity of the small molecule for DNA binding should 
essentially derive from the non-conserved sequences adjacent the DNA-binding 
protein's cognate DNA-binding site. This results in small molecule binding 
being reduced at similar DNA: protein binding sites at other locations. 

15 

6 . Further Considerations in Choosing Target Sites: Procaryotes 

and Viruses . 

Bacterial gene expression is regulated at several different levels, 
including transcription. General and specific transcription factors are needed 
20 along with the core RNA polymerase to accurately produce appropriate amounts of 
mRNA. Antibiotics that bind to the RNA polymerase and prevent mRNA production 
are potent bacterial poisons: molecules that could interfere with the 
initiation of transcription for specific essential genes are expected to have 
similar effects. 

25 Many bacterial promoters have been sequenced and carefully examined. In 

general, the majority of bacterial promoters have two well characterized 
r% regions, the -35 region which has a consensus sequence similar to SEQ ID NO: 625 

%. and the -10 region with a consensus sequence of SEQ ID NO: 626. The sequence of 

the start site for RNA polymerase, however, is not always the same. The start 
30 site is determined by a supplementary protein called the sigma factor, which 
\Q confers specificity for binding the RNA polymerase core. Several sigma factors 

e,| are present in any species of bacteria. Each sigma factor recognizes a 

different set of promoter sequences. Expression of sigma factors is regulated, 
Hi typically, by the growth conditions the bacteria is encountering. These sigma 

35 factor promoter sequences represent excellent targets for sequence specific DNA 
fn binding molecules. 

~ 1 " As an example of choosing target sequences for the purpose of designing a 

f DNA-binding therapeutic for a bacterial disease, consider the example of 

M= tuberculosis. Tuberculosis is caused by Mycobacterium tuberculosis . 

fjx'~4'0 "Ari~bact'eria~need to ma ke~ribosomes~f or~the~purpose~ of -protein" synthesis ~ 

. The -35 and -10 regions of M. tuberculosis ribosome RNA synthesis has been 

f~ determined. In the EMBL locus MTRRNOP the -35 signal is located at coordinants 

UJ 394.. 400 and the -10 signal is found at coordinants 419.. 422. These regions 

£3 represent excellent targets for a DNA binding drug that would inhibit the 

£l 4 5 growth of the bacteria by disrupting its ability to make ribosomes and 
: " synthesize protein. Multiple other essential genes could be targeted in a 

similar manner. 

M. tuberculosis is a serious public health problem for several reasons, 
including the development of antibiotic resistant strains. Many antibiotics 

50 inhibit the growth of bacteria by binding to a specific protein and inhibiting 
its function. An example of this is the binding of rifampicin to the beta 
subunit of the bacterial RNA polymerase. Continued selection of -bacteria with 
an agent of this kind can lead to the selection of mutants having an altered 
RNA polymerase so that the antibiotic can no longer bind it. Such mutants can 

55 arise from a single mutation. 

However, binding a drug to a DNA regulatory region requires at least two 
mutations to escape the inhibitory effect of the drug: one mutation in the 
target DNA sequence so that the drug could not bind the target sequence, and 
one mutation in the regulatory binding protein so that it can recognize the 

60 new, mutated regulatory sequence. Such a double mutation event is much less 
frequent than the single mutation discussed above, for example, with 
rifampicin. Accordingly, it is expected that the development of drug resistant 
bacteria would be much less common for DNA-binding drugs that bind to promoter 
sequences . 

65 
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The HIV viral promoter region (shown in Figure 28) provides an example of 
choosing DNA target sequences for sequence-specific DNA binding drugs to 
inhibit viral replication. 

Many eukaryotic viruses use promoter regions that have similar features 
5 to normal cellular genes. The. replication of these viruses depends on the 
general transcription factors present in the host cell. As such, the promoter 
sequences in DNA viruses are similar to those found in cellular genes and have 
been well- studied. The binding factors Sp-1 and TFIID are important 
generalized factors that most viral promoters use. 

10 In the HIV promoter sequence found in LOCUS HIVBH101 in version 32 of the 

EMBL databank, three tandem decanucleotide Spl binding sites are located 
between positions 377 and 409. Site III shows the strongest affinity for the 
cellular factor. The three cause up to a tenfold effect on transcriptional 
efficiency in vitro. The transcription start site is at position 455, with a 

15 TATA box at 427-431 in the sequence listed below. In addition to these sites, 
there are two NF-kappa-B sites in this region between nucleotides 350 and 373. 
These sites are annotated in Figure 28. 

Sequence-specific DNA binding molecules that specifically disrupted this 
binding would be expected to disrupt HIV replication. For example, the se- 

20 quences adjacent to the TFIID binding site (SEQ ID NO: 628 and/or SEQ ID 
NO: 629) , would be target sites for a DNA-binding molecule designed to disrupt 
TFIID binding. These sequences are found in HIV but are not likely to occur 
overlapping TFIID binding sites in the endogenous human genome. Multiple sites 
could be targeted to decrease the likelihood that a single mutation could 

25 prevent drug binding. 

O D . Using Test Matrices and Pattern Matching for the Analysis of Data . 

yi The assay described herein has been designed to use a single DNA:protein 

j5 interaction to screen for sequence-specific and sequence-preferential DNA- 

30 binding molecules that can recognize virtually any specified sequence. By 
42 using sequences flanking the recognition site for a single DNA: protein 

interaction, a very large number of different sequences can be tested. The 
\~\ analysis of data yielded by such experiments displayed as matrices and analyzed 

~tt by pattern matching techniques should yield information about the relatedness 

4 a 35 of DNA sequences. 

fll The basic principle behind the DNArprotein assay of the present invention 

'I is that when molecules bind DNA sequences flanking the recognition sequence for 

J 1 5 a specific protein the binding of that protein is blocked. Interference with 

protein binding likely occurs by either (or both) of two mechanisms: (i) 
fa,- 40" directly" by stearic "hindrance," or~tii) "indirectly by perturbations" transmitted 
Ls, to the recognition sequence through the DNA molecule. 

f~ Both of these mechanisms will presumably exhibit distance effects. For 

— inhibition by direct stearic hindrance direct data for very small molecules is 

O available from methylation and ethylation interference studies. These data 

%tl 45 ' suggest that for methyl and ethyl moieties, the stearic effect is limited by 
distance effects to 4-5 base pairs. Even still the number of different se- 
quences that can theoretically be tested for these very small molecules is 
still very large (i.e., 5 base pair combinations total 4 s (=1024) different se- 
quences) . 

50 In practice, the size of sequences tested can be explored empirically for 

different sized test DNA-binding molecules. A wide array of sequences with 
increasing sequence complexity can be routinely investigated. This may be 
accomplished efficiently by synthesizing degenerate oligonucleotides and 
multiplexing oligonucleotides in the assay process (i.e., using a group of 

55 different oligonucleotides in a single assay) or by employing pooled sequences 
in test matrices. 

In view of the above, assays employing a specific protein and oligonucle- 
otides containing the specific recognition site for that protein flanked by 
different sequences on either side of the recognition site can be used to 

60 simultaneously screen for many different molecules, including small molecules, 
that have binding preferences for individual sequences or families of related 
sequences. Figure 12 demonstrates how the analysis of a test matrix yields 
information about the nature of competitor sequence specificity. As an 
example, to screen for molecules that could preferentially recognize each of 

65 the 256 possible tetranucleotide sequences (Figure 13), oligonucleotides could 
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be constructed that contain these 256 sequences immediately adjacent to a 11 bp 
recognition sequence of UL9 oriS SEQ ID NO: 615, which is identical in each 
construct. 

In Figure 12 indicates that the mixture retards or blocks the 

formation of DNA: protein complexes in solution and "-" indicates that the 
mixture had no marked . effect on DNA:protein interactions. The results of this 
test are shown in Table V. 



Table V 



, Test.Mix- 


Specificity 


#1,4,7: oligos 


none detected for the above 


#2: for recognition site 


either nonspecific or specific 


#3 


AGCT 


#5 


CATT or ATT 


#6 


GCATTC, GCATT , CATTC, GCAT, or 
ATTC 


#8 


CTTT 



These results demonstrate how such a matrix provides data on the presence 
of sequence specific binding activity is a test mixture and also provides 
inherent controls for non-specific binding. For example, the effect of test 
mix #8 on the different test assays reveals that the test mix preferentially 
affects the oligonucleotides that contain the sequence CCCT. Note that the se- 
quence does not have to be within the test site for test mix #8 to exert an 
affect. By displaying the data in a matrix, the analysis of the sequences 
affected by the different test mixtures is facilitated. 

Furthermore, defined, ordered sets of oligonucleotides can be screened 
with a chosen DNA-binding molecule. The results of these binding assays can 
then be examined using pattern matching techniques to determine the subsets of 
sequences that bind the molecule with similar binding characteristics. If the 
structural and biophysical properties (such as, geometric shape and 
electrostatic properties) of sequences are similar, then it is likely that they 
will bind the molecule with similar binding characteristics. If the structural 
and biophysical properties of sequences are different, then it is likely that 
they will not bind the molecule with similar binding characteristics. In this 
context, the assay might be used to group defined, ordered sequences into 
subsets based on their binding characteristics: for example, the subsets could 
be defined as high affinity binding sites, moderate affinity binding sites, and 
low affinity binding sites. Sequences in the subsets with positive attributes 
(e.g., high affinity binding) have a high probability of having similar 
structural and biophysical properties to one another. 

By screening and analyzing the binding characteristics of a number of 
DNA-binding molecules against the same defined set of DNA sequences, data can 
be accumulated about the subsets of sequences that fall into the same or 
similar subsets. Using this pattern matching approach, which can be computer- 
assisted, the sequences with similar structural and biophysical properties can 
by grouped empirically. 

The database arising from pattern matching analysis of raw assay data 
will lead to the increased understanding of sequence structure and thereby lead 
to the design of novel DNA-binding molecules with related but different binding 
activities . 

E. Applications for the Determination of the Sequence Specificity of 
DNA-Bindinq Drugs . 

Applications for the determination of the sequence specificity of DNA- 
binding drugs are described below. The applications are divided into drug 
homo- and heteromeric polymers (part 1) and sequence-specific DNA-binding mole- 
cules as facilitators of triple strand formation (part 2) . 

One utility of the assay of the invention is the identification of 
highest affinity binding sites among all possible sites of a certain length for 
a given DNA-binding molecule. This information may be valuable to the design 
of new DNA-binding therapeutics. 
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1. Multimerization of Sequence-Preferential or Sequence-Specific 
DNA-Bindinq Molecules Identified in the Assay . 

Any particular DNA-binding small molecule screened in the assay may only 
recognize a 2-4 base pair site, and even if the recognition is quite specific, 
5 the molecule may be toxic because there are so many target sites in the genome 
(3 x 10 9 /4 4 4 bp sites, for example) . However, if drugs with differential 
affinity for different sites are identified, the toxicity of DNA-binding drugs 
may be drastically reduced by creating dimers, trimers, or multimers with these 
drugs (Example 13) . From theoretical considerations of the free energy changes 
10 accompanying the binding of drugs to DNA, the intrinsic binding constant of a 
dimer should be the square of the binding constant of the monomer {Le Pecq, 
J.B.). Experimental data confirmed this expectation in 1978 with dimer analogs 
of ethidium bromide (Kuhlmann, et al.) . Dimerization of several intercalating 
molecules, in fact, yields compounds with DNA affinities raised from 10 M" for 
15 the corresponding monomer to 10 8 to 10 9 M~ l for the dimers (Skorobogaty, et al.; 
Gaugain, et al. {1978a and b) ; Le Pecq, et al.; Pelaprat, et al.). Trimeri- 
zation, which theoretically should yield binding affinities that are the cube 
of the affinity of the homomonomeric subunit or the product of affinities of 
the heteromonomeric subunits, has yielded compounds with affinities as high as 
20 10 12 M -1 (Laugaa, et al.) . Such affinity is markedly better than the affinities 
seen for many DNA regulatory proteins. 

As a hypothetical example, if a relatively weak DNA-binding drug, drug X, 
which binds a 4 bp site with an affinity of 2 x 10 s M" 1 was dimerized; the bis-X 



drug would now recognize an 8 bp site with a theoretical affinity of 4 x 10 10 M~ 
25 l . The difference in affinity between the monomer X 1 and the bis-X form is 
p 200,000-fold. The number of 4 bp sites in the genome is approximately 1.2 x 10 7 

til versus the number' of 8 bp sites in the genome which is approximately 5 x 10 4 . 

.'^ Accordingly, there are 256-fold fewer 8 bp sites than 4 bp sites. Thus, the 

number of high affinity target sites is 256-fold fewer for the bis-X molecule 
C s 30 than the number of low affinity target sites for the monomer X, with a 200,000- 
Lij fold difference in affinity between the two types of sites. 

Since the binding constant of a dimer is the product of the binding 
constants of the monomers, when monomers with higher initial binding constants 
are formed into dimers (or multimers) the differential effect is 
ill 35 proportionately increased, creating a wider "window" of affinity versus the 
number of binding sites. The breadth of the window essentially reflects the 
r. margin of effective drug concentration compared to the relative toxicity. 

H There are two immediate ramifications of dimerization (or 

%r muftimeriTatibn) of ~~ monomer ic~~ drugs" ~wi"t"fi ~ mocieratfe toxicity" "and' "sequence" 

4 0 preference. First, the concentration of drug needed is lowered because of the 
higher affinity, so that even relatively toxic molecules can be used as drugs. 

Second, since toxicity is likely linked to the average number of drug mole- 
cules bound to the genome, as specificity is increased by increasing the length 
of the binding site, toxicity is decreased. 
4 5 Given the information already available on sequence-preferential binding 

of DNA-binding drugs, it is likely that each drug presented to the screening 
assay will have (i) a number of high affinity binding sites (e.g., 10 to 100- 
fold better affinity than the average site), (ii) a larger number of sites that 
are bound with moderate affinity (3 to 10-fold better affinity than average), 
50 (iii) the bulk of the binding sites having average affinity, and (iv) a number 
of sites having worse-than-average affinity. This range of binding affinities 
will likely resemble a bell-shaped curve. The shape of the curve will probably 
vary for each drug. To exemplify, assume that approximately five 4 bp sites 
will be high affinity binding sites, and twenty 4 bp sites will be moderately 
55 high affinity binding sites, then any given drug may recognize roughly 25, high 
or moderately high affinity binding sites. If 50 to 100 drugs are screened, 
this represents a "bank" of potentially 250-500 high affinity sites and 1000- 
2500 moderately high affinity sites. Thus, the probability of finding a number 
of high affinity drug binding sites that match medically significant target 
60 sites is good. Furthermore, heterodimeric drugs can be designed to match DNA 
target sites of 8 or more bp, lending specificity to the potential 
pharmaceuticals . 
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As discussed above, once the sequence preferences are known, the 
information may be used to design oligomeric molecules (homopolymers or 
heteropolymers) with substantially greater sequence specificity and 
substantially higher binding affinity. For example, if a DNA-binding molecule, 
5 X, binds a 4 bp sequence 5 '-ACGT-3 ' /5 ' -ACGT-3 • with an equilibrium affinity 
constant of 2 x 10 s NT 1 , then the dimer of X, X 2 , should bind the dimer of the 
sequence, 5 ' -ACGTACGT-3 * /5 • -ACGTACGT-3 ' , with an equilibrium affinity constant 
of (2 x 10 5 M" 1 ) 2 = 4 x 10 10 M~ 2 . The DNA-binding dimer molecule, X 2/ recognizes 
an 8 bp sequence, conferring higher sequence specificity, with a binding 
10 affinity that is theoretically 200,000-fold higher than the DNA-binding 
monomer, X. 

The same argument can be extended to trimer molecules: the trimer of X, 
X 3 , would bind a 12 bp sequence, 5 ' -ACGTACGT ACGT-3 ' /5 ' -ACGTACGTACGT-3 » , with a 
theoretical equilibrium affinity constant of 8 x 10 15 M" 2 . 

15 DNA-binding polymers constructed using the above-mentioned approach may 

be homo- or he'tero-polymers of the parent compounds or oligomeric compounds 
composed of mixed subunits of the parent compounds. Homopolymers are molecules 
constructed using two or more subunits of the same monomeric DNA-binding mole- 
cule. Heteropolymers are molecules constructed using two or more subunits of 

20 different monomeric DNA-binding molecules. Oligomeric compounds are 

constructed of mixed pieces of parent compounds and may be hetero- or 
homomeric. 

For example, distamycin is a member of a family of non-intercalating 
minor groove DNA-binding oligopeptides that are composed of repeating units of 
25 N-methylpyrrole groups. Distamycin has 3 N-methylpyrrole groups. Examples of 
homopolymers would be bis-distamycin, the dimer of distamycin, a molecule 
containing 6 N-methylpyrrole groups or tris-distamycin, the trimer of 
Ml distamycin, "a molecule containing 9 N-methylpyrrole groups. 

;ji Daunomycin is a member of an entirely different class of DNA-binding 

30 molecules, the anthracycline antibiotics, that bind to DNA via intercalation. 
fi Heteropolymers are molecules composed of different types of DNA-binding 

yj subunits; for example, compounds composed of a distamycin molecule linked to a 

ijj daunomycin molecule or a distamycin molecule linked to two daunomycin mole- 

1= cules. The term "oligomeric" is being used to describe molecules comprised of 

3* 35 linked subunits each of which may be smaller than the parent compound. 
In An example of an homo-oligomeric compound would be a distamycin molecule 

s linked to 1 or 2 additional N-methylpyrrole groups; the resulting molecule 

would not be as large as bis-distamycin, but would fundamentally be composed of 

\~_ the .same, .component .organic, moieties, that, .comprise the _parent„ molecule. 

= d ~ 4 0 Examples of a hetero-oligomeric compounds would be daunomycin linked to one or 
\tk two N-methylpyrrole groups. 

j t l The construction of these polymers will be directed by the information 

it derived from the sequence preferences of the parent compounds tested in the 

assay. In one embodiment of the assay, a database of preferred sequences is 

4 5 constructed, providing a source of information about the 4 bp sequences that 
bind with relatively higher affinity to particular drugs that may be linked 
together to target any particular larger DNA sequence. 

DNA-binding subunits can be chemically coupled to form heteropolymers or 
homopolymers. The subunits can be joined directly to each other, as in the 

50 family of distamycin molecules, or the subunits can be joined with a spacer 
molecule, such as carbon chains or peptide bonds. The coupling of subunits is 
dependent on the chemical nature of the subunits : appropriate coupling 
reactions can be determined for any two subunit molecules from the chemical 
literature. The choice of subunits will be directed by the sequence to be 

55 targeted and the data accumulated through the methods discussed in Section VI. B 
of this application. 

2 . Sequence-Specific DNA-Binding Molecules Identified in the 
Assay as Facilitators of Triplex Formation . 
60 Several types of nucleic acid base-containing polymers have been 

described that will form complexes with nucleic acids (for reviews, see Helene, 
C. and Toulme, J. -J.). One type of such a polymer forms a triple-stranded 
complex by the insertion of a third strand into the major groove of the DNA 
helix. Several types of base-recognition specific interactions of third strand 
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oligonucleotide-type polymers have been observed. One type of specificity is 
due to Hoogsteen bonding (Hoogsteen) . This specificity arises from recognition 
between pyrimidine oligonucleotides and double-stranded DNA by pairing thymine 
and adenine: thymine base pairs and protonated cytosine and guanine : cytosine 
base pairs (Griffin, et al.). Another type of specific interaction involves 
the use of purine oligonucleotides for triplex formation. In these triplexes, 
adenine pairs with adenine : thymine base pairs and guanine with guanine : cytosine 
(Cooney, et al.; Beal and Dervan) or thymine: adenine base paris (Griffen, L., 
and Dervan, P. B. ) . 

Other motifs for triplex formation have been described, including the 
incorporation of nucleic acid analogs (eg, methylphosphonates, 
phosphorothioates; Miller, et al.), and the invention of backbones other than 
the phosphoribose backbones normally found in nucleic acids (Pitha, et al.; 
Summerton, et al.). In several cases, the formation of triplex has been 
demonstrated to inhibit the binding of a DNA-binding protein (e.g., Young, et 
al.; Maher, et al.) or the expression of a cellular protein (Cooney, et al.) . 

Furthermore, several experiments have been reported in which' a small DNA- 
binding molecule has been covalently attached to polymer capable of forming a 
triplex structure: (i) an acridine :polypyrimidine molecule has been 

demonstrated to inhibit SV40 in CV-1 cells (Birg, et al.); (ii) cleavage at a 
single site in a yeast chromosome was achieved with an oligonucleotide: EDTA-Fe 
molecule (Strobel, et al.; Dervan); and (iii) a photoinducible endonuclease was 
created by similar strategy by attaching an ellipticine derivative to a 
homopyrimidine oligonucleotide (Perouault, et al.). Several other small 
intercalating agents coupled to oligonucleotides have been described (for 
review, see Montenay-Garestier ) . 

One utility of the assay of the present invention is to identify the se- 
quence-specificity of DNA-binding molecules for use in designing and 
synthesizing heteromeric therapeutics consisting of a DNA-binding polymer 
(e.g., an oligonucleotide) attached to a sequence-preferential or sequence- 
specific DNA-binding molecule, yielding a heteropolymer . The attached small 
molecule may serve several functions. 

First, if the molecule has increased affinity for a specific site (such 
as, a particular 4 base pair sequence) over all other sites of the same size, 
then the local concentration of the hetero-molecule, including the oligonucleo- 
tide, will be increased at those sites. The amount of heteropolymer, 
containing a sequence-specific moiety attached to one end, needed for treatment 
purposes is reduced compared to a heteropolymer that has a non-specific DNA- 
binding moiety attached. This reduction in treatment amount is directly 
proportional to- both the- differential specificity and the relative affinities 
between the sequence-specific binder and the non-specific binder. For the 
simplest example, if a sequence-specific molecule with absolute specificity 
(i.e., it binds only one sequence) had equal affinity for a specific 4 base- 
pair target site (1/256 possible combinations) as a non-specific molecule, then 
the amount of drug needed to exert the same effective concentration at that 
site could potentially be as much as 256-fold less for the specific and non- 
specific drugs. Accordingly, attaching a sequence-specific DNA-binding mole- 
cule to a polymer designed to form triplex structures allows increased 
localized concentrations. 

A second utility of the assay of the present invention is to identify 
small molecules that cause conformational changes in the DNA when they bind. 
The formation of triplex DNA requires a shift from B form to A form DNA. This 
is not energetically favorable, necessitating the use of increased amounts of 
polymer for triplex formation to drive the conformational change. However, the 
insertion of a small DNA-binding molecule (such as, actinomycin D) , which 
induces a conformational change in the DNA, reduces the amount of polymer 
needed to stabilize triplex formation. 

Accordingly, one embodiment of the invention is to use the assay to test 
known DNA-binding molecules with all 256 possible four base pair test sequences 
to determine the relative binding affinity to all possible 4 bp sequences. 
Then, once the sequence preferences are known, the information may be used to 
design heteropolymeric molecules comprised of a small DNA-binding molecule and 
a macromolecule, such as a triplex-forming oligonucleotide, to obtain a DNA- 
binding molecule with enhanced binding characteristics. The potential 
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advantages of attaching a sequence-specific or sequence-preferential DNA- 
binding small molecule to a triplex forming molecule are to (1) target the 
triplex to a subset of specific DNA sequences and thereby (11) anchor the 
triplex molecule in the vicinity of its target sequence and in doing so, (iii) 
5 increase the localized concentration of the triplex molecule, which allows 
(iv) lower concentrations of triplex to be used effectively. The presence of 
the small molecule may also facilitate localized perturbations in DNA 
structure, such as destabilizing the B form of DNA, which is unsuitable for 
triplex formation. Such destabalization may facilitate the formation of other 
10 structures, such a form DMA useful for triplex formation. The net effect would 
be to decrease the amount of triplex needed for efficacious results. 

F. Other Applications . 

The potential pharmaceutical applications for sequence-specific DNA- 
15 binding molecules are very broad, including antiviral, antifungal, 
antibacterial, antitumor agents, immunosuppressants, and cardiovascular drugs. 

Sequence-specific DNA-binding molecules can also be useful as molecular 
reagents as, for example, specific sequence probes. 

As more DNA-binding molecules are detected, information about their DNA 
20 binding affinities, sequence recognition, and mechanisms of DNA-binding will be 
gathered, eventually facilitating the design and/or modification of new mole- 
cules with different or specialized activities. 

Although the assay has been described in terms of the detection of se- 
quence-specific DNA-binding molecules, the reverse assay could be achieved by 
25 adding DNA in excess to protein to look for peptide sequence specific protein- 
binding inhibitors. . . 
P"* The following examples illustrate, but in no way are intended to limit, 
7p the present invention. 

yj 30 Materials and Methods 

Synthetic oligonucleotides were prepared using commercially available 
automated oligonucleotide synthesizers. Alternatively, custom designed 

synthetic oligonucleotides may be purchased, for example, from Synthetic 
UJ Genetics (San Diego, CA) . Complementary strands were annealed to generate 

JZ 35 double-strand oligonucleotides. 

Restriction enzymes were obtained from Boehringer Mannheim (Indianapolis 
Ui IN) or New England Biolabs (Beverly MA) and were used as per the manufacturer's 

s directions. . 

U Distamycin A and Doxorubicin were obtained from Sigma (St. Louis, MO) . 

-.5,.- 40 Actinomycin- D was obtained from -Boehringer Mannheim. or -Sigma . _ _ — . _ _ - -. - 
f" Standard cloning and molecular biology techniques are described m 

5 s * Ausubel, et al., and Sambrook, et al. 

\ 1 1 

f =5 Example 1 

^45 Preparation of the Oligonucleotide 

Containing the Screening Sequence 

This example describes the preparation of (A) 

biotinylated/digoxigenin/radiolabeled, and (B) radiolabeled double-stranded 
oligonucleotides that contain the screening sequence and selected Test se- 
50 quences. 

A. Biotinylation . 

The oligonucleotides were prepared as described above. The wild-type 
control sequence for the UL9 binding site, as obtained from HSV, is shown m 

55 Figure 4. The screening sequence, i.e. the UL9 binding sequence, is 
CGTTCGCACTT (SEQ ID NO: 601) and is underlined in Figure 4. Typically, se- 
quences 5' and/or 3* to the screening sequence were replaced by a selected Test 
sequence (Figure 5) . 

One example of the preparation of a site-specifically biotinylated oligo- 

60 nucleotide is outlined in Figure 4. An oligonucleotide primer complementary to 
the 3' sequences of the screening sequence-containing oligonucleotide was 
synthesized. This oligonucleotide terminated at the residue corresponding to 
the C in position 9 of the screening sequence. The primer oligonucleotide was 
hybridized to the oligonucleotide containing the screening sequence. Biotin- 

65 11-dUTP (Bethesda Research Laboratories (BRL) , Gaithersburg MD) and Klenow 
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enzyme were added to this complex (Figure 4) and the resulting partially 
double-stranded biotinylated complexes were separated from the unincorporated 
nucleotides using either pre-prepared "G-25 SEPHADEX " spin columns (Pharmacia, 
Piscataway NJ) or "NENSORB" columns (New England Nuclear) as per manufacturer's 
5 instructions. The remaining single-strand region was converted to double- 
strands using DNA polymerase I Klenow fragment and dNTPs resulting in a fully 
double-stranded oligonucleotide. A second "G-25 SEPHADEX" column was used to 
purify the double-stranded oligonucleotide. Oligonucleotides were diluted or 
resuspended in 10 mM Tris-HCl, pH 7.5, 50 mM NaCl, and 1 mM EDTA and stored at 
10 -20°C. For radiolabelling the complexes, 32 P-alpha-dCTP (New England Nuclear, 
Wilmington, DE) replaced dCTP for the double-strand completion step. 

Alternatively, the top strand, the primer, or the fully double-stranded 
oligonucleotide have been radiolabeled with y- 32 P-ATP and polynucleotide kinase 
(NEB, Beverly, MA) . Most of our preliminary studies have employed 
15 radiolabeled, double-stranded oligonucleotides. The oligonucleotides are 
prepared by radiolabeling the primer with T4 polynucleotide kinase and y- 32 P- 
ATP, annealing the "top" strand full length oligonucleotide, and "filling-in" 
with Klenow fragment and deoxynucleotide triphosphates. After phosphorylation 
and second strand synthesis, oligonucleotides are separated from buffer and 
20 unincorporated triphosphates using "G-25 SEPHADEX" preformed spin columns (IBI, 
New Haven, CT or Biorad, Richmond CA) . This process is outlined in Figure 4. 
The reaction conditions for all of the above Klenow reactions were as follows: 
10 mM Tris-HCl, pH 7.5, 10 mM MgCl 2 , 50 mM NaCl/ 1 mM dithioerythritol, 0.33- 
100 |iM deoxytriphosphates, 2 units Klenow enzyme (Boehringer-Mannheim, 
25 Indianapolis IN) . The Klenow reactions were incubated at 25°C for 15 minutes 
to 1 hour. The polynucleotide kinase reactions were incubated at 37 °C for 30 
=-*y minutes to 1 hour. 
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.1 % B . End-Labeling with Digoxiqenin . 

*fi 30 The biotinylated, radiolabeled oligonucleotides or radiolabeled oligonu- 

^ cleotides were isolated as above and resuspended in 0.2 M potassium cacodylate 

Li (pH=7.2), 4 mM MgCl 2 , 1 mM 2-mercaptoethanol, and 0.5 mg/ml bovine serum 

"1* albumin. To this reaction mixture digoxigenin-ll-dUTP (an analog of dTTP, 2'- 

t: deoxy-uridine-5* -triphosphate, coupled to digoxigenin via an 11-atom spacer 

£ !l 35 arm, Boehringer Mannheim, Indianapolis IN) and terminal deoxynucleotidyl 
- transferase (GIBCO BRL, Gaithersburg, MD) were added. The number of Dig-11- 

dUTP moieties incorporated using this method appeared to be less than 5 

:_7 (probably _only 1 or_ 2 )_a_s_ judged by electrophoretic mobility on pol ya cry 1 amide _ 

^ A gels of the treated fragment as compared to oligonucleotides of "known 'length.' 

M= 4 0 The biotinylated or non-biotinylated, digoxygenin-containing, 

hi radiolabeled oligonucleotides were isolated as above and resuspended in 10 mM 

Tris-HCl, 1 mM EDTA, 50 mM NaCl, pH 7.5 for use in the binding assays. 
^ The above procedure can also be used to biotinylate the other strand by 

h- using an oligonucleotide containing the screening sequence complementary to the 

4 5 one shown in Figure 4 and a primer complementary to the 3' end of that mole- 
cule. To accomplish the biotinylation Biotin-7-dATP was substituted for 
Biotin-ll-dUTP. Biotinylation was also accomplished by chemical synthetic 
methods: for example, an activated nucleotide is incorporated into the oligonu- 
cleotide and the active group is subsequently reacted with NHS-LC-Biotin 
50 (Pierce) . Other biotin derivatives can also be used. 

C. Radiolabelling the Oligonucleotides . ^ 
Generally, oligonucleotides were radiolabeled with gamma- P-ATP or 
alpha- 32 P-deoxynucleotide triphosphates and T4 polynucleotide kinase or the 

55 Klenow fragment of DNA polymerase, respectively. Labelling reactions were' 
performed in the buffers and by the methods recommended by the manufacturers 
(New England Biolabs, Beverly MA; Bethesda Research Laboratories, 'Gaithersburg 
MD; or Boehringer/Mannheim, Indianapolis IN) . Oligonucleotides were separated 
from buffer and unincorporated triphosphates using "G-25 SEPHADEX" preformed 

60 spin columns (IBI, New Haven, CT; or Biorad, Richmond, CA) or "NENSORB" 
preformed columns (New England Nuclear, Wilmington, DE) as per the manufactur- 
ers instructions. 

There are several reasons to enzymatically synthesize the second strand. 
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The two main reasons are that by using an excess of primer, second strand 
synthesis can be driven to near completion so that nearly all top strands are 
annealed to bottom strands, which prevents the top strand single strands from 
folding back and creating additional and unrelated double-stranded structures, 

and secondly, since all of the oligonucleotides are primed with a common 
primer, the primer can bear the end-label so that all of the oligonucleotides 
will be labeled to exactly the same specific activity. 

Example 2 
Preparation of the UL9 Protein 

A. Cloning of the UL9 Protein-Coding Sequences into pAC373 . 

To express full length UL9 protein a baculovirus expression system has 
been used. The sequence of the UL9 coding region of Herpes Simplex Virus has 
been disclosed by McGeoch et al. and is available as an EMBL nucleic acid se- 
quence. The recombinant baculovirus AcNPV/UL9A, which contained the UL9 
protein-coding sequence, was obtained from Mark Challberg (National Institutes 
of Health, Bethesda MD) . The construction of this vector has been previously 
described (Olivo, et al . (1988, 1989)). Briefly, the Narl/EcoRV fragment was 
derived from pMC160 (Wu, et al.) . Blunt-ends were generated on this fragment 
by using all four dNTPs and the Klenow fragment of DNA polymerase I (Boehringer 
Mannheim, Indianapolis IN) to fill in the terminal overhangs. The resulting 
fragment was blunt-end ligated into the unique BamHI site of the baculoviral 
vector pAC3T3 . (Summers, et al.). 

B. Cloning of the UL9 Sequence in pVL1393 . 

The UL9 protein-coding region was cloned into a second baculovirus 
vector, pVL1393 (Luckow, et al.). The 3077 bp Narl/EcoRV fragment containing 
the UL9 gene was excised from vector pEcoD (obtained from Dr. Bing LanRong, 
Eye Research Institute, Boston, MA): the plasmid pEcoD contains a 16.2 kb 
EcoRI fragment derived from HSV-I that bears the UL9 gene (Goldin, et al.). 
Blunt-ends were generated on the UL9-containing fragment as described above. 
EcoRI linkers (10 mer) were blunt-end ligated (Ausubel, et al.; Sambrook, et 
al.) to the blunt-ended Narl/EcoRV fragment. 

The vector pVL1393 (Luckow, et al.) was digested with EcoRI and the 
linearized vector isolated. This vector contains 35 nucleotides of the 5' end 
of the coding region of the polyhedron gene upstream of the polylinker cloning 
site. The polyhedron gene ATG has been mutated to ATT to prevent translational 
initiation in recombinant clones that do not contain a coding sequence with a 
functional ATG. The EcoRI/VL9 fragment was ligated into the linearized vector, 
the ligation 'mixture transformed into E. 'col~i ~ and ampicillin. resistant -clones- 
selected. Plasmids recovered from the clones were analyzed by restriction 
digestion and plasmids carrying the insert with the amino' terminal UL9 protein- 
coding sequences oriented to the 5' end of the polyhedron gene were selected. 
This plasmid was designated pVL1393/UL9 (Figure 7) . 

pVL1393/UL9 was cotransf ected with wild-type baculoviral DNA (AcMNPV; 
Summers, et al.) into SF9 (Spodoptera frugiperda) cells (Summers, et al.). 
Recombinant baculovirus-infected Sf9 cells were identified and clonally 
purified (Summers, et al.). 

C. Expression of the UL9 Protein . 

Clonal isolates of recombinant baculovirus infected Sf9 cells were grown 
in Grace's medium as described by Summers, et al. The cells were scraped from 
tissue culture plates and collected by centrif ugation (2,000 rpm, for 5 
minutes, 4°C) . The cells were then washed once with phosphate buffered saline 
(PBS) (Maniatis, et al.). Cell pellets were frozen at -70°C. For lysis the 
cells were resuspended in 1.5 volumes 20 mM HEPES, pH 7.5, '10% glycerol, 1.7 M 
NaCl, 0.5 mM EDTA, 1 mM dithiothreitol (DTT) , and 0.5 mM phenyl methyl sulfonyl 
fluoride (PMSF) . Cell lysates were cleared by ultracentrif ugation (Beckman 
table top ultracentrif uge, TLS 55 rotor, 34 krpm, 1 hr, 4°C) . The supernatant 
was dialyzed overnight at 4°C against 2 liters dialysis buffer (20 mM HEPES, pH 
7.5, 10% glycerol, 50 mM NaCl, 0.5 mM EDTA, 1 mM dtt, and 0.1 mM PMSF). 

These partially purified extracts were prepared and used in DNA: protein 
binding experiments. If necessary extracts were concentrated using a 
"CENTRICON 30 n filtration device (Amicon, Danvers MA) . 
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D. Cloning the Truncated UL9 Protean . 

The sequence encoding the C-terminal third of UL9 and the 3' flanking se- 
quences, an approximately 1.2 kb fragment, was subcloned into the bacterial 
expression vector, pGEX-2T {Figure 6). The pGEX-2T is a modification of the 
5 pGEX-1 vector of Smith, et al. which involved the insertion of a thrombin 
cleavage sequence in-frame with the glutathione-S-transf erase protein (gst) . 

A 1,194 bp BamHI/EcoRV fragment of pEcoD was isolated that contained a 
951 bp region encoding the C-terminal 317 amino acids of UL9 and 24 3 bp of the 
3* untranslated region. 
10 This BamHI/EcoRV UL9 carboxy-terminal (UL9-COOH) containing fragment was 

blunt-ended and EcoRI linkers added as described above. The EcoRI linkers were 
designed to allow in-frame fusion of the UL9 protein-coding sequence to the 
gst-thrombin coding sequences. The linkered fragment was isolated and digested 
with EcoRI. The pGEX-2T vector was digested with EcoRI, treated with Calf 
15 Intestinal Alkaline Phosphatase (CIP) and the linear vector isolated. The 
EcoRI linkered UL9-COOH fragment was ligated to the linear vector {Figure 6) . 
The ligation mixture was transformed into E. coli and ampicillin resistant 
colonies were selected. Plasmids were isolated from the ampicillin resistant 
colonies and analyzed by restriction enzyme digestion. A plasmid which 
2 0 generated a gst/thrombin/UL9-COOH in frame fusion was identified (Figure 6) and 
designated pGEX-2T/UL9-COOH. 

E. Expression of the Truncated UL9 Protein . 

E. coli strain JM109 was transformed with pGEX-2T/C-UL9-COOH and was 
2 5 grown at 37°C to saturation density overnight. The overnight culture was 
diluted 1:10 with LB medium containing ampicillin and grown from one hour at 
30°C. IPTG (isopropyllthio-p-galactoside) (GIBCO-BRL) was added to a final 
concentration of 0.1 mM and the incubation was continued for 2-5 hours. _ 
Bacterial cells containing the plasmid were subjected to the temperature shift 
]% 30 and IPTG conditions, which induced transcription from the tac promoter. 
^ Cells were harvested by centrifugation and resuspended in 1/100 culture 

L'j volume of MTPBS (150 mM NaCl, 16 mM Na 2 HP0 4 , 4 mM NaH 2 PO^). Cells were lysed by 

ji! sonication and lysates cleared of cellular debris by centrifugation. 

~Z The fusion protein was purified over a glutathione agarose affinity 

^ 35 column as described in detail by Smith, et al. The fusion protein was eluted 
CH from the affinity column with reduced glutathione, dialyzed against UL9 

s dialysis buffer (20 mM HEPES pH 7.5, 50 mM NaCl, 0.5 mM EDTA, 1 mM DTT, 0.1 mM 

;= PMSF) and cleaved with thrombin (2 ng/ug of fusion protein). 

.f 3 . _ An- aliquot _ of - the supernatant obtained from. ,IPTG-induced_ cultures of 

H 4 0 pGEX-2T/C-UL9-COOH-containing cells and an aliquot of the affinity-purified, 
U thrombin-cleaved protein were analyzed by SDS-polyacrylamide gel 

\ t i electrophoresis. The result of this analysis is shown in Figure 8. The 63 

kilodalton GST/C-UL9 fusion protein is the largest band in the lane marked GST- 
UL9 (lane 2). The first lane contains protein size standards. The UL9-COOH 
4 5 protein band (lane GST-UL9 + Thrombin, Figure 8, lane 3) is the band located 
between 30 and 4 6 kD: the glutathione transferase protein is located just 
below the 30 kD size standard. In a separate experiment a similar analysis was 
performed using the uninduced culture: it showed no protein corresponding in 
size to the fusion protein. 
50 Extracts are dialyzed before use. Also, if necessary, the extracts can 

be concentrated typically by filtration using a "CENTRICON 30" filter. 

Example 3 
Binding Assays 

55 A. Band Shift Gels . 

DNA: protein binding reactions containing both labelled complexes and free 
DMA were separated electrophoretically on 4-10% polyacrylamide/Tris-Borate-EDTA 
(TBE) gels (Fried, et al.; Garner, et al.). The gels were then fixed, dried, 
and exposed to X-ray film. The autoradiograms of the gels were examined for 
60 band shift patterns. 
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B . Filter Binding Assays . 

A second method used particularly in determining the half-lives for oli- 
gonucleotide : protein complexes is filter binding (Woodbury, et al.). 
Nitrocellulose disks (Schleicher and Schuell, BA85 filters) that have been 
5 soaked in binding buffer (see below) were placed on a vacuum filter apparatus. 
DNA:protein binding reactions (see below; typically 15-30 Ml) are diluted to 
0.5 ml with binding buffer (this dilutes the concentration of components 
without dissociating complexes) and applied to the discs with vacuum applied. 
Under low salt conditions the DNA:protein complex sticks to the filter while 
10 free DNA passes through. The discs are placed in scintillation counting fluid 
(New England Nuclear), and the cpm determined using a scintillation counter. 

This technique has been adapted to 96-well and 72-slot nitrocellulose 
filtration plates (Schleicher and Schuell) using the above protocol except (i) 
the reaction dilution and wash volumes are reduced, and (ii) the flow rate 
15 through the filter is controlled by adjusting the vacuum pressure. This method 
greatly facilitates the number of assay samples that can be analyzed. Using 
radioactive oligonucleotides, the samples are applied to nitrocellulose 
filters, the filters are exposed to x-ray film, then analyzed using a Molecular 
Dynamics scanning densitometer. This system transfers data directly into 
20 analytical software programs (e.g., Excel) for analysis and graphic display. 

Example 4 
Functional UL9 Binding Assay 

A. Functional DNA-Binding Activity Assay . 
25 Purified protein was tested for functional activity using band-shift 

assays. Radiolabelled oligonucleotides (prepared as in Example IB) that 
f-% contain the 11 bp recognition sequence were mixed with the UL9 protein in 

binding buffer (optimized reaction conditions: 0.1 ng 32 P-DNA, 1 ul UL9 
™ extract, 20 mM HEPES, pH 7.2, 50 mM KC1, and 1 mM DTT) . The reactions were 

^3 30 incubated at room temperature for 10 minutes (binding occurs in less than 2 
=J3 minutes), then separated electrophoretically on 4-10% non-denaturing 

polyacrylamide gels. UL9-specific binding to the oligonucleotide is indicated 
by a shift in mobility of the oligonucleotide on the gel in the presence of the 
yj UL9 protein but not in its absence. Bacterial extracts containing (+) or 

35 without (-) UL9 protein and affinity purified UL9 protein were tested in the 
assay. Only bacterial extracts containing UL9 or affinity purified UL9 protein 
" y ' generate the gel band-shift indicating protein binding. 

s The degree of extract that needed to be added to the reaction mix, in 

order to obtain UL9 protein excess relative to the oligonucleotide, was 

- empirically determined for - each- protein preparation/extract — Allquots of the 

\~ preparation were added to the reaction mix and treated as above. The quantity 

P & of extract at which the majority of the labelled oligonucleotide appears in the 

Lj DNA: protein complex was evaluated by band-shift or filter binding assays. The 

f~. assay is most sensitive under conditions in which the minimum amount of protein 

?~ 4 5 is added to bind most of the DNA. Excess protein decreases the sensitivity of 
^ the assay with respect to the ability of inhibitors to compete with the protein 

for oligonucleotide binding, except when protein concentrations are so high 
that non-specific protein/DNA binding is provoked. 

50 B. Rate of Dissociation . 

The rate of dissociation is determined using a competition assay. An 
oligonucleotide having the sequence presented in Figure 4, which contained the 
binding site for UL9 (SEQ ID NO: 614), was radiolabelled with 32 P-ATP and 
polynucleotide kinase (Bethesda Research Laboratories) . The competitor DNA was 

55 a 17 base pair oligonucleotide (SEQ ID NO: 616) containing the binding site for 
UL9. 

In the competition assays, the binding reactions (Example 4A) were 
assembled with each of the oligonucleotides and placed on ice. Unlabelled oli- 
gonucleotide (1 \iq) was added 1, 2, 4, 6, or 21 hours before loading the 
60 reaction on an 8% polyacrylamide gel (run in TBE buffer (Maniatis, et al.)) to 
separate the reaction components. The dissociation rates, under these 
conditions, for the truncated UL9 (UL9-COOH) and the full length UL9 is 
approximately 4 hours at 4°C. In addition, random oligonucleotides (a 10,000- 
fold excess) that did not contain the UL9 binding sequence and sheared herring 
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sperm DNA (a 100,000-fold excess) were tested: neither of these control DNAs 
competed for binding with the oligonucleotide containing the UL9 binding site. 

C. Optimization of the UL9 Binding Assay . 
5 1. Truncated UL9 from the Bacterial Expression System . 

The effects of the following components on the binding and 
dissociation rates of UL9-C00H with its cognate binding site have been tested 
and optimized: buffering conditions (including the pH, type of buffer, and 
concentration of buffer); the type and concentration of monovalent cation; 
10 the presence of divalent cations and heavy metals; temperature; various 
polyvalent cations at different concentrations; and different redox reagents at 
different concentrations. The effect of a given component was evaluated 
starting with the reaction conditions given above and based on the dissociation 
reactions described in Example 4B. , 
15 The optimized conditions used for the binding of UL9-COOH contained m 

bacterial extracts {Example 2E) to oligonucleotides containing the HSV ori se- 
quence (SEQ ID NO: 601) were as follows: 20 mM HEPES, pH 7.2, 50 mM KC1, 1 mM 
DTT, 0.005 - 0.1 ng radiolabeled (specific activity, approximately 10 B cpm/ng) 
or digoxiginated, biotinylated oligonucleotide probe, and 5-10 Hg crude UL9- 
20 COOH protein preparation (1 mM EDTA is optional in the reaction mix). Under 
optimized conditions, UL9-C00H binds very rapidly and has a dissociation rate 
of about 4 hours at 4°C with non-biotinylated oligonucleotide and 5-10 minutes 
with biotinylated oligonucleotides. The dissociation rate of UL9-COOH changes 
markedly under different physical conditions. Typically, the activity of a UL9 
25 protein preparation was assessed using the gel band-shift assay and related to 
the total protein content of the extract as a method of standardization. The 
addition of herring sperm DNA depended on the purity of UL9 used in the 
^ experiment Binding assays were incubated at 25°C for 5-30 minutes. 

iii 30 2. Full Length UL9 Protein from the Baculovirus System . 

The binding reaction conditions for the full length baculovirus- 
¥i produced UL9 polypeptide have also been optimized. The optimal conditions for 

^ the current assay were determined to be as follows: 20 mM Hepes; 100 mM NaCl; 

Jj- 0.5 mM dithiothreitol; 1 mM EDTA; 5% glycerol; from 0 to 10 -fold excess of 

M 35 sheared herring sperm DNA; 0.005 - 0.1 ng radiolabeled (specific activity, 
approximately 10 8 cpm/ug) or digoxiginated, biotinylated oligonucleotide probe, 
and 5-10 ug crude UL9 protein preparation. The full length protein also binds 
we 11_ under the optimized conditions established _ for the truncated UL9-COOH 
protein . 
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Example 5 

The Effect of Test Sequence Variation on the 
Half -Life of the UL9 DNA: Protein Complex 

The oligonucleotides shown in Figure 5 were radiolabeled as described 
4 5 above. The competition assays were performed as described in Example 4B using 
UL9-COOH. Radiolabeled oligonucleotides were mixed with the UL9-C00H protein 
in binding buffer {typical reaction: 0.1 ng oligonucleotide 32 P-DNA, 1 \il UL9- 
COOH extract, 20 mM HEPES, pH 7.2, 50 mM KC1, 1 mM EDTA, and 1 mM DTT). The 
reactions were incubated at room temperature for 10 minutes. A zero time point 
50 sample was then taken and loaded onto an 8% polyacrylamide gel (run use TBE) . 
One Hg of the unlabelled 17 bp competitive DNA oligonucleotide (SEQ ID NO: 616) 
(Example 4B) was added at 5, 10, 15, 20, or 60 minutes before loading the 
reaction sample on the gel. The results of this analysis are shown m Figure 
9: the screening sequences that flank the UL9 binding site (SEQ ID NO:605-SEQ 
55 ID NO: 613) are very dissimilar but have little effect on the off-rate of UL9. 
Accordingly, these results show that the UL9 DNA binding protein is effective 
to bind to a screening sequence in duplex DNA with a binding affinity that is 
substantially independent of test sequences placed adjacent the screening se- 
quence. Filter binding experiments gave the same result. 
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Example 6 

The Effect of Actinomycin D, Distamycin A, and 
Doxorubicin on UL9 Binding to the screening Sequence 
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is Dependent on the Specific Test Sequence 

Different oligonucleotides, each of which contained the screening se- 
quence (SEQ ID NO: 601) flanked on the 5 ? and 3' sides by a test sequence (SEQ 
ID NO: 605 to SEQ ID NO : 613) , were evaluated for the effects of distamycin A, 
5 actinomycin D, and doxorubicin on UL9-C00H binding. 

Binding assays were performed as described in Example 5. The oligonucle- 
otides used in the assays are shown in Figure 5. The assay mixture was allowed 
to pre-equilibrate for 15 minutes at room temperature prior to the addition of 
drug. 

10 A concentrated solution of Distamycin A was prepared in dH 2 0 and was 

added to the binding reactions at the following concentrations: 0, 1 UM, 4 UM, 
16 MM, and 4 0 uM. The drug was added and incubated at room temperature for 1 
hour. The reaction mixtures were then loaded on an 8% polyacrylamide gel 
(Example 5) and the components separated elect rophoretically . Autoradiographs 
15 of these gels are shown in Figure 10A. The test sequences tested were as 
follows: UL9 polyT, SEQ ID NO: 609; UL9 CCCG, SEQ ID NO: 605; UL9 GGGC, SEQ ID 
NO: 606; UL9 polyA, SEQ ID NO: 608; and UL9 ATAT, SEQ ID NO: 607 . These results 
demonstrate that Distamycin A preferentially disrupts binding to UL9 polyT, UL9 
polyA and UL9 ATAT. 

20 A concentrated solution of Actinomycin D was prepared in dH 2 0 and was 

added to the binding reactions at the following concentrations: 0 UM and 50 MM. 

The drug was added and incubated at room temperature for 1 hour. Equal 
volumes of dH 2 0 were added to the control samples. The reaction mixtures were 
then loaded on an 8% polyacrylamide gel (Example 5) and the components 
25 separated electrophoretically . Autoradiographs of these gels are shown in 
Figure 10B. In addition to the test sequences tested above with Distamycin A, 
the following test sequences were also tested with Actinomycin D: AToril, SEQ 
=| ID NO: 611; oriEco2, SEQ ID NO: 612, and oriEco3, SEQ ID NO: 613. These results 

Uy demonstrate that actinomycin D preferentially disrupts the binding of UL9 to 

30 the oligonucleotides UL9 CCCG and UL9 GGGC. 
e,* A concentrated solution of Doxorubicin was prepared in dH 2 0 and was added 

to the binding reactions at the following concentrations: 0 |iM, 15 UM and 35 
uM. The drug was added and incubated at room temperature for 1 hour. Equal 
4= volumes of dH 2 0 were added to the control samples. The reaction mixtures were 

fj! 35 then loaded on an 8% polyacrylamide gel (Example 5) and the components 
separated electrophoretically. Autoradiographs of these gels are shown in 
" a Figure IOC. The same test sequences were tested as for Actinomycin D. These 

i** results demonstrate that Doxorubicin preferentially disrupts the binding of UL9 

| s r ~ to the oligonucleotides UL9pdlyT, UL9~ GGGC, " oriEco2, "and driEcd3. Doxorubicin 
4 0 appears to particularly disrupt the UL9 : screening sequence interaction when the 
I', test sequence oriEco3 is used. The sequences of the test sequences for oriEco2 

and oriEco3 differ by only one base: an additional T residue inserted at 
position 12, compare SEQ ID NO: 612 and SEQ ID NO: 613. 

4 5 Example 7 

Use of the Biotin/Stxeptavidin Reporter System 

A. The Capture of Protein-Free DNA . 

Several methods have been employed to sequester unbound DNA from 
DNA: protein complexes. 
50 1. Magnetic Beads . 

Streptavidin-conj ugated superparamagnetic polystyrene beads 
(Dynabeads M-280 Streptavidin, Dynal AS, 6-7xl0 8 .beads/ml) are washed in 
binding buffer then used to capture biotinylated oligonucleotides (Example 1) . 
The beads are added to a 15 ul binding reaction mixture containing binding 
55 buffer and biotinylated oligonucleotide. The beads/oligonucleotide mixture is 
incubated for varying lengths of time with the binding mixture to determine the 
incubation period to maximize capture of protein-free biotinylated' oligonucleo- 
tides. After capture of the biotinylated oligonucleotide, the beads can be 
retrieved by placing the reaction tubes in a magnetic rack (96-well plate 
60 magnets are available from Dynal) . The beads are then washed. 
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2 . Agarose Beads . 

Biotinylated agarose beads (immobilized D-biotin, Pierce, Rockford, 



75 



IL) are bound to avidin by treating the beads with 50 ug/nl avidin in binding 
buffer overnight at 4°C. The beads are washed in binding buffer and used to 
capture biotinylated DNA. The beads are mixed with binding mixtures to capture 
biotinylated DNA. The beads are removed by centrifugation or by collection on 
5 a non-binding filter disc. 

For either of the above methods, quantification of the presence of the 
oligonucleotide depends on the method of labelling the oligonucleotide. If the 
oligonucleotide is radioactively labelled: (i) the beads and supernatant can 
be loaded onto polyacrylamide gels to separate DNA: protein complexes from the 
10 bead: DNA complexes by electrophoresis, and autoradiography performed; (ii) the 
beads can be placed in scintillation fluid and counted in a scintillation 
counter. Alternatively, presence of the oligonucleotide can be determined 
using a chemi luminescent or colorimetric detection system. 

B. Detection of Protein-Free DNA . 

The DNA is end-labelled with digoxigenin-ll-dUTP (Example 1). The 
antigenic digoxigenin moiety is recognized by an antibody-enzyme conjugate, 
anti-digoxigenin-alkaline phosphatase (Boehringer Mannheim Indianapolis IN) . 
The DNA/antibody-enzyme conjugate is then exposed to the substrate of choice. 
The presence of dig-dUTP does not alter the ability of protein to bind the DNA 
or the ability of streptavidin to bind biotin. 

1 : Chemiluminescent Detection . 

Digoxigenin-labelled oligonucleotides ar^e detected using the 
chemiluminescent detection system "SOUTHERN LIGHTS" developed by Tropix, Inc.' 
(Bedford, MA) . Use of this detection system is illustrated in Figures 11A and 
11B. The technique can be applied to detect DNA that has been captured on 
either beads or filters. 

Biotinylated oligonucleotides, which have terminal digoxygenin-containing 
residues (Example 1), are captured on magnetic (Figure 11A) or agarose beads 
(Figure 11B) as described above. The beads are isolated and treated to block 
non-specific binding by incubation with I -Light blocking buffer (Tropix) for 30 
minutes at room temperature. The presence of oligonucleotides is detected 
using alkaline phosphatase-conj ugated antibodies to digoxygenin. Anti- 
digoxigenin-alkaline phosphatase (anti-dig-AP, 1:5000 dilution of 0.75 
units/ul, Boehringer Mannheim) is incubated with the sample for 30 minutes, 
decanted, and the sample washed with 100 mM Tris-HCl, pH 7.5, 150 mM NaCl. The 
sample is pre-equilibrated with 2 washes of 50 mM sodium bicarbonate, pH 9.5, 1 
M MgCl 2 , then incubated in the same buffer containing 0.25 mM 3-(2'- 
~spiroa~damant ahe) -~4=metKbxy-4 - ( 3 '^phosphoryloxy )- —phenyl--! ,-2-dioxet ane- -disodium 
salt (AMPPD) for 5 minutes at room temperature. AMPPD was developed (Tropix 
Inc.) as a chemiluminescent substrate for alkaline phosphatase. Upon 
dephosphorylation of AMPPD the resulting compound decomposes, releasing a 
prolonged, steady emission of light at 477 run. 

Excess liquid is removed from filters and .the emission of light occurring 
as a result of the dephosphorylation of AMPPD by alkaline phosphatase can be 
measured by exposure to x-ray film or by detection in a luminometer. 

In solution, the bead-DNA-anti-dig-AP is resuspended in "SOUTHERN LIGHT" 
assay buffer and AMPPD and measured directly in a luminometer. Large scale 
screening assays are performed using a 96-well plate-reading luminometer 
(Dynatech Laboratories, Chantilly, VA) . Subpicogram quantities of DNA (10 to 
10* attomoles (an attomole is 10" 18 moles)) can be detected using the Tropix 
system in conjunction with the plate-reading luminometer. 

55 2. Colorimetric Detection . 

. Standard alkaline phosphatase colorimetric substrates are also 
suitable for the above detection reactions. Typically substrates include 4- 
nitrophenyl phosphate (Boehringer Mannheim) . Results of colorimetric assays 
can be evaluated in multiwell plates (as above) using a plate-reading 

60 spectrophotometer (Molecular Devices, Menlo Park CA) . The use of the light 
emission system is more sensitive than the colorimetric systems. 
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Example 8 , 
Labelling Test Oligonucleotides to 
Equivalent Specific Activities 

The top strands of 256 oligonucleotides, containing all possible 4 bp se- 
5 quences in the test sites flanking the UL9 recognition site, were synthesized. 
The oligonucleotides were composed of identical sequences except for the 4 bp 
sites flanking either side of the UL9 recognition sequence (SEQ ID No: 601). 
The oligonucleotides had the general sequence presented in Figure 14B (SEQ ID 
NO:617), where XXXX is the test sequence and N = A,G,C, or T. A 12 bp primer 
10 sequence, which is the complementary sequence to the 3' -end of the test oligo- 
nucleotide, was also synthesized: the primer was designated the HSV primer and 
is presented as SEQ ID NO: 618. 

The HSV primer was used to prime second strand synthesis and to 
facilitate labeling the oligonucleotides to the same specific activity. Oligo- 
15 nucleotide labelling was accomplished by labeling the 5 ? end of the HSV primer 
and then using the same primer to prime second strand synthesis of all 256 test 
oligonucleotides. The 5* end of the primer can be labeled with radioisotopes 
such as 32 P, 33 P, or 35 S, or with non-radioactive detection systems such as 
digoxygenin or biotin as discussed in the Capture/Detection section. 
20 Radioactive-labeling of the primer with 32 P is accomplished by the 

enzymatic transfer of a radioactive phosphate from y- 32 P-ATP to the 5* end of 
the primer oligonucleotide using T4 polynucleotide kinase {Ausubel, et al.). 
For labeling 256 oligonucleotides, approximately 60 \iq HSV primer was labeled 
as follows. The oligonucleotide was incubated for 1 hour at 37 °C with 125 (il y- 
25 32 P-ATP (20 mCi total, 7000 Ci/mmol) and 600 units of T4 polynucleotide kinase 
in a 3 ml reaction volume containing 50 mM Tris-HCL, pH 7.5, 10 mM MgCl 2 , 10 mM 
\*4 spermidine, and 1.5 mM dithiothreitol (freshly prepared). To stop the 

=J2 reaction, EDTA was added to a final concentration of 20 mM. Unincorporated 

tjj nucleotides were removed using "G-2 5 SEPHADEX" chromatography in 10 mM Tris- 

30 HCL, pH 7.5, 50 mM NaCl, and 1 mM EDTA (TE+50) . 
^ The radioactive primer was individually annealed to the top strand of 

UJ each of the 256 test oligonucleotides. The bottom strand is synthesized using 

Ijj deoxyribonucleotides and Klenow fragment or T4 polymerase (Ausubel, et al.). 

jj~ The annealing mixture typically contained 200 ng HSV primer mixed with 1 |ig top 

J 35 strand in 20 mM Tris-HCL, pH 7.5, 1 mM spermidine, and 0.1 mM EDTA (35 Hi 
' reaction volume). The primer was annealed to the top strand by incubating the 

'] sample for 2 minutes at 70°C, then placing the sample at room temperature or on 

f" ice. To the annealing mixture, 4.5 Hi lOx Klenow buffer (10X = 200 mM Tris- 

¥- HCL; "500" mM" NaCl; ~50 "mM MgCl 2 , 10" mM" dithiothreitol ) 5 ~ ul ' 0. 5 ~mM each dNTP 

H 40 (dATP, dCTP, dGTP, dTTP) , and 1 ul Klenow fragment were added. This reaction 
Lj mixture was incubated 30-60 minutes at room temperature (or up to 37°C) . 

The volume of the reaction mixture was increased by adding 75 ul a 
?l solution of 10 mM Tris-HCl, pH 7.5, 50 mM NaCl, and 10 mM EDTA. The reaction 

r " mixture was applied to a 1 ml "G-25 SEPHADEX" (in TE+50) spin column. The spin 

4 5 columns were prepared by plugging Ice tuberculin syringes with silanized glass 
wool and adding a slurry of "G-25 SEPHADEX . " The columns were prespun at 2000 
rpm in a tabletop centrifuge for 4 minutes. The samples (reaction mixtures) 
were passed through the column by centrifugation (2000 rpm, 4 minutes at room 
temperature) to remove unincorporated deoxyribonucleotides. The incorporation 
50 of P was measured by placing a small volume of the sample in scintillation 
fluor and determining the disintegrations per minute (dpms) in a scintillation 
counter. 

The radiolabeled double-stranded oligonucleotides were then diluted to 
the same specific activity (equal dpms per volume) . Typically, a concentration 
55 of 0.1 to 1 ng/fil oligonucleotide was used in the assay. 

The same procedure can be used for second strand synthesis and labeling 
to equal specific activity regardless of the type of label on the HSV primer. 
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Example 9 
An Arrayed Sample Format 

Screening large numbers of test molecules or test sequences is most 
easily accomplished in an arrayed sample format, for example, a 96-well plate 
5 format. Such formats are readily amenable to automation using robotics 
systems. Several different types of disposable plastic plates are available 
for use in screening assays including the following: polyvinyl chloride (PVC), 
polypropylene (PP), polyethylene (PE), and polystyrene { PS) plates. Plates, or 
any testing vehicle in which the assay is performed, are tested for protein and 
10 DNA adsorption and coated with a blocking reagent if necessary. 

One method for testing protein or DNA adsorption to plates is to place 
assay mixtures in the wells of the plates for varying lengths of time. Samples 
are then removed from the wells and a nitrocellulose dot blot capture system 
(Ausubel, et a!.; Schleicher and Schuell) is used to measure the amount of 
15 DNA: protein complex remaining in the mixture over time. 

When radiolabeled oligonucleotides are used for the test, signal can be 
measured using autoradiography and a scanning laser densitometer. A decrease 
in the amount of DNA: protein complex in the absence of competitor molecules is 
indicative of plate adsorption. If plate adsorption occurs, the plates are 
20 coated with a blocking agent prior to use in the assay. 

None of the plates listed above showed marked adsorption at a 30 minute 
time point under the conditions of the assay. However, most plates, regardless 
of brand, showed significant adsorption at times greater than 2 hours. 

Coating the plates with a blocking agent decreases variability in the 
25 assay. Several types of blocking reagents typically used to block the 
adsorption of macromolecules to plastic are known, primarily from 
f-: immunoscreening procedures. For example, plates may be blocked with either 1% 

™5 bovine serum albumin (BSA) in phosphate-buffered saline (PBS), or 0.1% gelatin, 

0.05% "TWEEN2 9" in PBS. 
yQ 30 To test for the effectiveness of using such blocking reagents, the plates 

were treated with the above reagents for 1 hour at room temperature, then 
washed three times with 0.05% "TWEEN20" in PBS and once with the assay buffer. 
Assay reaction mixtures were aliquoted to the plates and tested as described 
U4 above using dot blot capture assays. Both of the blocking reagents (BSA or 

jp 35 gelatin) were effective in blocking DNA and protein binding — except when 
•ff=. polypropylene plates were used. Based on these experiments, PVC plates blocked 

a : with BSA were determined to work well in the assay of the present invention. 

- Plates were tested for inter- and intra-plate variability by aliquoting 

j*£ duplicate samples to all 96-wells of several plates, and determining the amount 

-j^. .4 0 of DNA :.protein. complex .recovered using ,the dot. blot /nitrocellulose „sys.tem. _ The 
j~ coefficient of variation [%CV = (the standard deviation/mean) *100] was 

calculated for intra-plate variability (i.e., between samples on the same 
tjj plate) and inter-plate variability (i.e., between plates). Blocked PVC plates 

ri showed an intra-plate %CV of 5-20%; inter-plate variability was about 8%. 

U 45 

" 3 " Example 10 

Sequence Selectivity and Relative Binding 
Affinity for Distamycin 

using the assay method of the present invention, distamycin was tested 
50 for sequence selectivity and relative binding affinity to 256 different 4 bp 
sequences . 

A. The Assay Mixture . 

Water, buffer and UL9 were mixed on ice and aliquoted to the wells of a 
55 96-well plate. The addition of water/UL9/buf f er mix was accomplished with an 
8-channel repipettor, which holds a relatively large volume and allowed rapid, 
accurate pipetting to all 96 wells of a master experimental plate. 

Radiolabeled double-stranded oligonucleotides were aliquoted from 96-well 
master stock plates (containing the array of all 256 oligonucleotides diluted 
60 to the same specific activity) to the wells of the master experimental plates. 

Master assay mixtures in the master experimental plates were thoroughly 
mixed by pipetting up and down. The mixtures were aliquoted to the test 
plates. Each test plate typically included one sample as a control (no test 
molecules added) and as many test samples as were needed for different test 
65 molecules or test molecule concentrations. There were 3 master oligonucleotide 
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stock plates, containing the array of 256 oligonucleotides. Accordingly, an 
experiment testing distamycin at different concentrations would require 256 
control assays (one for each oligonucleotide) and 256 assays at each of the 
drug concentrations to be tested. 
5 The following assay mixture was used for testing distamycin in the assay 

of the present invention: 1.5 nM radiolabeled DNA and 12.8 nM UL9-C00H protein 
(prepared as described above in the UL9 binding buffer; 20 mM Hepes, pH 7.2, 50 
mM KC1, and 1 mM dithiothreitol} . The concentration of the components in the 
assay mixture can be varied as described above in the Detailed Description. 
10 Assay mixtures containing both UL9 and DNA were incubated at room 

temperature for at least 10 minutes to allow the DNA: protein complexes to form 
and for the system to come to equilibrium. At time = 0, the assay was begun by 
adding water (control samples) or distamycin (5-15 MM, test samples) to the 
assay mixtures using a 12-channel micropipettor . After incubation with drug 
15 for 5-120 minutes, samples were taken and applied to nitrocellulose on a 96- 
well dot blot apparatus (Schleicher and Schuell) . The samples were held at 
4°C. 

Tests were performed in duplicate. Typically, one set of 256 test oligo- 
nucleotides was scrambled with respect to location on the 96-well plate to 
20 eliminate any effects of plate location. 

B. The Capture/Detection System . 

A 96-well dot blot apparatus was used to capture the DNA: protein 
complexes on a nitrocellulose filter. The filters used in the dot blot 
25 apparatus were pretreated as follows. The nitrocellulose filter was pre-wetted 
with water and soaked in UL9 binding buffer. The filter was then placed on 1 
to 3 pieces of 3 MM filter paper, which were also presoaked in UL9 binding 
*J3 buffer. All filters were chilled to 4°C prior to placement in the apparatus. 

y1 Prior to the application of the assay sample to the wells of the dot-blot 

Jjg 30 apparatus, the wells were filled with 375 Ml of UL9 binding buffer. Typically, 
5-50 Ml of sample (usually 10-15 Ml) were pipetted into the wells containing 
binding buffer and a vacuum applied to the system to pull the sample through 
the nitrocellulose. Unbound DNA passes through the nitrocellulose, protein- 
bound DNA sticks to the nitrocellulose. The filters were dried and exposed to 
f{\ 35 X-ray film to generate autoradiographs . 

- C . Quantitation of Data . 

The autoradiographs of the nitrocellulose filters were analyzed with a 

Molecular Dynamics (Sunnyvale; CA) scanning laser densitometer using an 

f~ 4 0 ImageQuant software package (Molecular Dynamics) . Using this software, a 96- 
7. well grid was placed on the image of the autoradiograph and the densitometer 

yj calculated the "volume" of each dot ("volume" is equivalent to the density of 

each pixel in the grid square multiplied by the area of the grid square) . The 
program automatically subtracts background. The "background was determined by 
4 5 either the background of a line or object drawn outside the grid or by using 
the gridlines as background for each individual dot. 

The data is exported to a spreadsheet program, such as "EXCEL" (Microsoft 
Corporation, Redmond, WA) for further analysis. 
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50 D. Analysis of Data . 

The data generate from the densitometry analysis was analyzed using the 
spreadsheet program "EXCEL." 

For each test oligonucleotide, at each drug concentration and/or each 
time point, a raw % score was calculated. The raw % score (r%) can be 
55 described as 

r% - (T/C) x 100 

where T was the densitometry volume of the test sample and C was the 
60 densitometry volume of the control sample. The oligonucleotides were then 
ranked from 1 to 256 based on their r% score. Further calculations were based 
on the rank of each oligonucleotide with respect to all other oligonucleotides. 

The rank of each oligonucleotide was averaged over several experiments 
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(where one experiment is equivalent to testing all 256 test oligonucleotides by 
the assay of the present invention) in view of the variability m rank between 
any two experiments. The confidence level for the ranking of the oligonucleo- 
tides increased with repetition of the experiment. 

Figure 15 shows the results of 4 separate experiments with distamycin. 
The test samples were treated with 10 UM distamycin for 30 minutes. The r% 
scores are shown for each of the 4 experiments (labeled 918A, 918B, 1022A, and 
1022B) and the ranks of each oligonucleotide in each experiment are shown. The 
test oligonucleotides have been ranked from 1 to 256 based on their average 
rank. The average rank was the sum of the ranks in the individual experiments 
divided by the number of experiments. 

Figures 16 and 17 show the results presented in Figure 15 m graphic 
form. Figure 16 shows the average ranks plotted against the ideal ranks 1 to 
256. Figure 17 shows the average r% scores plotted against the rank of 1 to 
256. These data demonstrate the reproducible ability of the assay to detect 
differential binding and effects of distamycin on different 4 bp sequences. 

Example 11 

Determining a Consensus Binding Site for Distamycin 

One method used to determine the sequence preferences for distamycin was 
to examine the sequences that rank highest in the assay for sequence 
similarities. This process may be accomplished visually or by designing 
computer programs to inspect the data. 

□sing the data shown in Figure 15, consensus sequences can be constructed 
for distamycin in the following manner. Sequences with rankings less than 50 
(indicating a strong effect of distamycin on the test sequence) in all four 
experiments were: 



TABLE VI 



Sequence 


Rank 


TTCC 


1 


TTAC 


2 


TACC 


3 


TATC 


4 


TTCG 


6 


ACGG 


8 



- - Sequences with rankings . less than 50 (indicating a strong effect of 
distamycin on the test sequence) in three of the four experiments were: 



TABLE VII 



i; Sequence 


Rank . 


AACG 


5 


TTTC 


7 


TTAG 


10 


TAAC 


12 


TACG 


15 


AGAC 


17 


AAAC 


18 


AGCG 


21 


AGCC 


22 


TTCT 


24 


ACGC 


25 


AGGG 


28 


AGGC 


30 


TTGC 


37 


ATCG 


39 


TTTG 


43 



Sequences with rankings less than 50 (indicating a strong effect of 
distamycin on the test sequence) in two of the four experiments were: 
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TABLE VIII 



Sequence 


Dnnb 

Kanx. 


TAGC 


9 


TTGG 


11 


AAAG 


13 


AACC 


14 


CAAC 


16 


ATCC 


19 


AAGG 


20 


TAAG 


23 


ACCC 


26 


TCCC 


29 


TATG 


31 


ACCG 


32 


TCGG 


34 


Ab it 


J D 


CTCG 


38 


AATC 


44 


AGAG 


46 


TTAA 


47 


ACAC 


48 


AGTG 


49 


TCAC 


52 



The following assumptions allow prediction of a consensus sequen'ce for a 
distamycin recognition sequence: (i) the most favored sequences are the test 
sequences that rank in the top 50 in all four experiments; (ii) the next 
favored sequences will be the test sequences that rank in the top 50 in 3 of 4 
experiments; and (iii) the next favored sequences will be the test sequences 
that rank in the top 50 in 2 of 4 experiments. 

The positions in the test sequence are represented by the numerals 1, 2, 
3 and 4. One consensus sequence that predicted from the above binding data is: 

12 3 4 

T T/A N C/G 

The nucleotides at each position can also be ranked: 

" "1 ■ ' " " '2 " " "3 4 ' " ' ' 

T T>A C>A>T>G OG 

Furthermore, the importance of the position of the nucleotide can be 
ranked. Examination of this data would indicate that the importance of the 
positions is 

1 > 4 > 2 > 3. 

These data can be tested for validity by deriving all possible consensus 
sequences and examining their scores in the assay. The consensus sequences 
derived from the above information, in order of rank as predicted by the 
consensus sequence, are: 



TABLE IX 



Sequence 


;.V Predicted Rank 


Actual Rank 


TTCC 


1 


1 


TACC 


2 


3 


TTCG 


3 


6 


TACG 


4 


15 


TTAC 


5 


2 


TAAC 


6 


12 


TTAG 


7 


10 


TAAG 


8 


23 
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Sequence 


Predicted Rank 


Actual Rank ; 


TTTC 


. 9 


7 


TATC 


10 


4 


TTTG 


11 


43 




1 0 


11 


TTGC 


13 


37 


TAGC 


14 


9 


TTGG 


15 


11 


TAGG 


16 


58 




Average rank: 


17 



Note that the actual rank numbers are out of a possible 256 and that only 
one number is greater than 50- The average rank of these 16 oligos is only 17. 
These data indicate that the consensus sequence has predictive value. 

Using the same data, a second consensus sequence can be derived that has 
slightly worse average rank with respect to the relative effect of distamycin 
in the assay. 
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TABLE X 



. 1" 


2 


3 


4 


A 


A/G/C 


G/C/A 


G/C 




A>G=C 


C>A=G 


G=C 



The test sequences predicted by this consensus sequence are as follows: 



TABLE XI 



Sequence • 


Actual, rank . . 


AACG 


5 


AACC 


14 


AAAG 


13 


AAAC 


18 


AAGG 


20 


AAGC 


74 


AGCG 


21 


AGCC 


22 


AGAG 


46 


AGAC 


17 


AGGG 


28 


AGGC 


30 


ACCG 


32 


ACCC 


26 


ACAG 


73 


ACAC 


48 


ACGG 


8 


ACGC 


25 


Ave. rank: 


29 



15 This consensus sequence also appears to be predictive of favored 

distamycin binding sites since the average rank of test oligonucleotides 
predicted by this sequence is 29, substantially below the median rank of 128. 
However, the sequences predicted by this consensus sequence do not appear to be 
affected as strongly by distamycin as the sequences in the first consensus se- 

20 quence, described above. 
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Example 12 

Testing Actinomycin D to Determine Sequence 

Specificity and Relative Binding Affinity 
A. Ranking of Actinomycin D Sequence Binding Affinities . 
Actinomycin D has been tested for sequence selectivity >and relative 
binding affinity to the 256 different 4 bp sequences. The assay was performed 
essentially as described in Example 10. One assay mixture useful for the 
testing of actinomycin D contained 1.5 nM radiolabeled DNA and 12.8 nM UL9-C00H 
protein prepared as described above in the UL9 binding buffer (20 mM Hepes, pH 
7.2, 50 mM KC1, and 1 mM dithiothreitol) . The concentration of the components 
can be varied as described in the Detailed Description. The assay mixtures 

containing both UL9 and DNA were incubated at room temperature for at least 10 
minutes to allow the DNA: protein complexes to form and for the system to come 
to equilibrium. At time = 0, the assay was begun by adding water (control 
samples) or actinomycin D (25 fiM, test samples) to the assay mixtures using a 
12-channel micropipettor . After incubation with drug for 30 minutes, samples 
were taken and applied to nitrocellulose filters using a 96-well dot blot 
apparatus (Schleicher and Schuell) held at 4°C. Figure 18 shows the results of 
8 screens of actinomycin D. 

The % reduction in DNA: protein complex as a result of the presence of 
actinomycin D is called "r%"; the lower the r% score, the more effective the 
test molecule in blocking the DNA: protein interaction. For each screen, the 
test oligonucleotides have been ranked from 1 to 256, based on the r% score; 
the rank of 1 denotes the lowest r% score (the test oligonucleotide most 
effected by the test molecule), the rank of 256 denotes the highest r% score 
(the test oligonucleotide least effected by the test molecule) . The table also 
shows the average r% score and average rank of each test oligonucleotide; the 
averages are calculated from the sum of the individual scores and ranks divided 
by the number of screens, respectively. The test oligonucleotides are then 
ranked from 1 to 256 based on the average rank in all screens. The final 
ranking is shown in the two external columns on the table. Test oligonucleo- 
tides ranking less than 50 in any individual screen are shown in highlighted 
boxes . 

Figure 19 shows the final rank of test oligonucleotides screened with 
actinomycin D plotted against the average r% score for these test oligonucleo- 
tides . 

Figure 20 shows the final ranking vs. the ranks in each individual 
experiment, the average rank, and the ideal rank. 

~ B. " ~ Analysis of the Data Obtained from' Ranking - Actinomycin D~ Sequence - 
Binding Affinities . 

Several simple analytical procedures may be applied to the data from the 
screens . 

1 . Position Effects . 

First, to examine possible preferences of the test molecule for a 
base at any particular position in the test site, the average r% scores are 
examined. The average r% scores for each of the 64 possible test oligonucleo- 
tides at each position in the test site are averaged. For example, to 
determine the effect of having an A in the first position of the test site, the 
"hi" position, the average r% scores for the 64 test oligonucleotides with A in 
the first position are averaged. The results of this analysis are shown in 
Figure 21. The mean score for all oligonucleotides in these screens was r% 
value 67; the standard deviation was 11.8. 

If the r% score is expressed as variance from the mean, as shown in 
Figure 21, one observes that none of the scores is markedly deviant from the 
mean. These results suggest that a single base in any particular position has 
little impact on the binding of the actinomycin D to the test site. 

2 . Dinucleotide Analysis . 

The results of the actinomycin D screen were examined for the 
presence of dinucleotide pairs that scored well or poorly in the rankings. 
High scores indicate a preference for the test sequence. Low scores indicate a 
repulsion of actinomycin D for the test sequence. A dinucleotide analysis is 
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one of many simple analytical procedures that may be applied to the data to 
extract meaningful impressions about the nature of the sequences to which the 
test molecule has high affinity. 

The data are examined in a manner similar to that used for the single 
5 nucleotide analysis. The 16 possible average r% scores for any particular 
dinucleotide combination are examined. Specific adjacent dinucleotides (NxNs, 
N 2 N 3 , N 3 N 4 ) or adjacent dinucleotide pairs at any particular position (N x N x+ i = 
the average of NiN 2 , N 2 N 3 , and N 3 N<} may be examined, as well as specific 
dinucleotide pairs that are not adjacent (NiN 3 , N 2 N 3 , N^) and any dinucleotide 
10 pair separated by one base (N x N x+2 = the average of N X N 3 and N 2 N 3 ) . The means for 
each set are determined as well as standard deviations. 

The difference from the mean (i.e., the mean score less the average r% 
score for any particular dinucleotide) reflects the extent of deviation from 
the norm. Differences from the mean greater than 2-3 standard deviations from 
15 the mean are considered to be significant. The data for the dinucleotide 
analysis of actinomycin D is shown in Figure 22. The differences from the mean 
are displayed graphically in Figure 23. 

In reference to Figures 22 and 23, the dinucleotide preference of 
actinomycin D is GC, particularly in the NiN 2 position, but also at any (N X N X+1 ) 
20 adjacent dinucleotide sequence in the test site. 

If the data are combined in a combined bar chart, shown in Figure 24, 
where the cumulative results for any dinucleotide pair are tabulated in a 
single bar, the overall observation can be made that actinomycin D prefers GO 
rich sequences over AT-rich sequences, with a particular preference for the 
25 dinucleotide pairs involving GC. 

Example 13 

A Method for Selecting Target Sites for DNA-Binding 
Molecules that are Pinters or Trimers of Distamycin 

\Q 30 Once the relative binding preferences of a distamycin have been 

=J3 determined, sequences are selected for target sites for DNA-binding molecules 

\~\ composed of two distamycin molecules, bis-distamycins, or three distamycin 

~~ molecules, tris-distamycins . 

IU 

J~ 35 A. Selecting Sequences for Binding with Highest Affinity to Distamycin 

Oligomers . 

The top binding sites for distamycin, determined as described above, are 
defined by the consensus sequence, 5 * -T : T/A: C/A: C-3 * : accordingly, the top se- 
quences are . TTCC, TTAC, TACC and TAAC. Using this information, 2* = 16 
40- possible dimer sequences, i.e. , combinations of the four top binding, sequences, _ 
can be targeted by a bis-distamycin in which the distamycin molecules are 
immediately adjacent to one another. 

The top strands of the 16 possible duplex DNA target sites for binding 
bis-distamycins are shown in Figure 25. Similarly, trimers of distamycin, 
4 5 tris-distamycins, could be targeted toward selected 12 bp sequences, comprised 
of all possible combinations of the four 4 bp sequences. There are 3 = 81 
possible highest affinity target trimer sequences. 

There are several advantages to targeting longer sequences with bis- or tris- 
distamycin : 



tea? 



B. As the Number of Potential Target Sites Decreases, Specificity 
Increases . 




50 



f or . 

55 The consensus sequence used in this example predicts 

for distamycin. This represents (4/4 4 )*lOO = about 1.6% of the possible 4 bp 
sites in the genome. Since there are 4 a possible 8 bp sequences, this 
represents, on average, only {2 4 /4 8 )*100 = about 0.02% of the total genome. 
There are 4 12 possible 12 bp sequences, this represents, on average, only 

60 (3 4 /4 12 )*100 = 0.00000075% of the genome. 

The following discussion provides perspective and illustrates the 
improvement in the actual number of target sites in the human genome for when 
using a dimer of distamycin versus a monomer of distamycin. The human genome 
is about 3 x 10 9 bp. If the number of favored target sites for distamycin is 
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four, and the number of possible 4 bp sequences is 4 4 =256, then the number of 
favored target sites in the genome is (4/256) {3 x 10 9 ) = 4.7 x 10 7 , or about 50 
million favored target sites. 

Given that the number of possible 8 bp sites is 4 s = 65,536, if all 
5 possible combinatorial 8 bp sites derived from the favored 4 bp sites (2 4 = 16; 
Figure 25) are favored, then the number of favored 8 bp target sites is 
(16/65, 536) (3 x 10 9 ) = 7.3 x 10 5 or about 700,000 possible sites. This 
represents a 64-fold reduction in the number of highest affinity target sites 
between distamycin and bis-distamycin; alternatively, this result can be viewed 
10 as a 64-fold increase in specificity. 



Likewise, given that the number of possible 12 bp sites is 4 12 = 1.7 



x 



10 7 , if all possible favored 12 bp sites (3 4 = 81) are favored, then the number 
of favored 12 bp target sites is (81/1.7 x 10 7 ) (3 x 10 9 ) = 1.4 x 10 4 : i.e., 
14,000 possible highest affinity sites. This represents an approximately 3000- 
15 fold decrease in the number of highest affinity target sites between distamycin 
and tris-distamycin and a 500-fold decrease in the number of highest affinity 
target sites between bis-distamycin and tris-distamycin. 

C. An Exponential Increase in Affinity . 
20 As the target site increases in size, (i) the number of target sites in a 

defined number of nucleotides decreases, and (ii) the specificity increases. 
Further, the affinity of binding is typically the product of the binding 
affinities of component parts (see Section VI.E.l above). As an example, the 
published binding constant for distamycin to bulk genomic DNA is about 2 x 10 5 
25 M" 1 . Dimers of distamycin will have a theoretical binding affinity of the 
P square of the binding constant of distamycin: 

% (Ka i3ta , average = 2 x lOV 1 ; K bi3 _ dista = (2 x lO^ 1 ) 2 = 4 x 10 10 M~ L ) . 

42 30 Trimers of distamycin will have binding affinities of the cube of the 

Lij binding affinity of distamycin: 



(K tris . dista = (2 x lO^" 1 ) 3 = 8 x 10 lb M" 1 ). 



& l 35 Thus, if distamycin shows only a 10-fold higher affinity (2 x 10 6 M" X ) for 

=i the top favored binding sites than the average binding sites in DNA, then the 

affinity constant for bis-distamycin to an 8 bp site comprised of two favored 

_!_" binding- si-tes- -is -100-fold- -higher —than—for— an -8- bp- .sequence.— comprised- .of _t.wo_ 

^" average binding sites: 

H 40 

yj <K bis - di3ta , favored sit es /K bis . dista , average 3ite3 = (2 X 10 6 ) 2 /(2 x 10 5 ) 2 = 100). While this 

?2 does not represent absolute sequence specificity in binding, the binding 

*~ affinity is 100-fold greater for 0.02% (16/65,536) of the total possible 8 bp 

^~ target sequences. 

4 5 The use of a trimer targeted sequence will afford an even higher increase 

in affinity to the most favored binding sites: 

Ktris-d i3 t a , favored 3ite 3 /K tris - di3ta , a verag e 3 it e3 = (2 X 10 6 )V(2 x 10 5 ) 3 = 1000. Thus, with 

only 10-fold differential activity in binding between favored sites and average 
sites, a 1000-fold difference in affinity can be achieved by designing trimer 

50 molecules to specific target sites. When considering the administration of 
DNA-binding molecules as drugs, a 1000-fold lower dose of tris-distamycin, 
versus the distamycin monomer, could be administered and an increase in 
relatively specific binding to selected target sites achieved. 

In this example, the differential activity of distamycin is only 10-fold.' 

55 Clearly, differential activities of larger magnitudes will greatly accentuate 
the increased affinity effect. For example, a 100-fold difference in activity 
of a 4 bp DNA-binding molecule toward high affinity and average affinity se- 
quences would result in (i) a 10,000-fold difference in the binding affinity of 
a dimer of the molecule targeted to an 8 bp sequence, and (ii) a million-fold 

60 increase in the binding affinity of the trimer to a 12 bp sequence. 
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D. Selecting Target Sequences for Distamycin Oligomers with Flexible 
and/or Variable-Length Linkers in Between the Distamycin Moieties . 

The sequences that can be targeted with bis- or tris-distamycin molecules 
are not limited to sequences in which the two 4 bp favored binding sites are 
5 immediately adjacent to one another. Flexible linkers can be placed between 
the distamycin moieties and sequences can be targeted that are not immediately 
adjacent. The target sequences can have distances of 1 to several bases 
between them: this distance depends on the length of the chemical linker. 
Examples of bis-distamycin target sequences for bis-distamycins with internal 
10 flexible and/or variable length linkers targeted to sites comprised of two TTCC 
sequences are shown in Figure 26, where N is any base. 

For each particular bis-distamycin, the explanations of increased 
affinity and specificity remain the same as described above with the following 
exception. For the case in which the linker was sufficiently flexible to span 
15 different numbers of bases in between the two distamycin sites, the number of 
sites targeted with highest affinity would be multiplied by the number of bases 
spanned. 

In respect to the ease of drug design and target selection, there are 
several advantages to the above described targeting strategies, including the 
20 following: 

i) Any conformational changes induced by binding at the half-site 
would be minimized. 

ii) The affinity, therefore, would be more likely to be the product of 
the affinities of the interactions observed for the monomeric sites. 

25 iii) The half-molecule {e.g, 1 distamycin unit) would anchor the bis- 

molecule (e.g., bis-distamycin) thus increasing the localized concentration for 
the binding of the second half of the bis-molecule . 

iv) If a simple linking chain is used, with a variable number of atoms, 
the number of sites that can be targeted by multimers of the monomer increases .- 
C ! 30 This targeting method can be of value when, for example, there are no 
ijl medically significant target sites with adjacent favored binding sites for 

(il distamycin. Therefore there are no good target sites for bis-distamycin. In 

this situation, the database can be screened for additional target sequences 
^ with Ni ton (where N is any base) between the two target binding sequences. For 

■p 35 example, where n=4, the number of sequences to be searched becomes {4 2 )*4 = 64. 
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fp. The likelihood of finding such a sequence is reasonably high. 

s E. Selecting a Specific Target Site . 

Lk Using the above approach, a sequence was identified from the medically 

Ls -4-0--- significant-target site ■ database- that contains -SEQ ID NO:619, which is a -subset- 
f7 of the group of sequences represented by SEQ ID NO: 620. SEQ ID NO: 619 occurs 

^~ overlapping the binding site for a transcription factor, Nuclear Factor of 

hi Activated T Cells (NFAT-1 ) , which is a major regulatory factor in the induction 

of interleukin 2 expression early in the T cell activation response. NFAT-1 is 
4 5 crucial in (i) the T cell response, and (ii) in blocking the expression of IL- 
2, which causes immunosuppression. The sequences TTCC and TTTC, the distamycin 
target binding sequences in SEQ ID NO: 619, rank first and seventh in the assay. 

50 Example 14 

The Use of the Assay in Competition Studies 

The assay of the present invention measures the effect of the binding of 
a DNA-binding molecule to a test site by the release of a protein from an 
adjacent screening site. Accordingly, the assay is an indirect assay. 
55 Following here is the description of an application of the assay useful to 
provide confirmatory evidence of the data obtained in the initial screening 
processes. 

The results of the distamycin screening assay described in Example 10 
suggested that there were possible false negatives: specifically, test se- 
60 quences that bind distamycin but fail to show an effect on the binding of the 
reporter protein. The data suggesting false negatives was as follows. If the 
assay detected strictly the affinity of binding of distamycin, then the scores 
of the test sequences complementary to the high-scoring test sequences should 
always be equally high. However, an examination of the highest ranking test 
65 sequences and the complementary test sequences reveals that this is not the 
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case (see Table XII). 



TABLE XII 



. - 
Rank 


• • Test 
Sequence. 


': Complement 


Rank of ■ 
complement 


1 


TTCC 


GGAA 


42 


2 


TTAC 


GTAA 


244 


3 


TACC 


GGTA 


185 


4 


TATC 


GATA 


213 


5 


AACG 


CGTT 


144 


6 


TTCG 


CGAA 


216 


7 


TTTC 


GAAA 


235 



All but one of the complementary sequences rank in the lower half, 4 of 
them in the lowest 20%, i.e., these was little effect on reporter protein 
binding in the presence of distamycin when using these sequences as test se- 
quences in the assay. 

This observation reflects the usefulness of a confirmatory assay that 
examines the relative affinity of a particular sequence for binding distamycin. 

A confirmatory assay may also be useful in revealing additional information 
about the physical characteristics of drug binding. For example, one can 
hypothesize that the reason for the apparent inverse relationship between test 
sequences with high activity in the assay and their complements is that the 
effect of distamycin is directional and only active at one test site. This 
hypothesis can be tested using the following competition experiment. 
Competitor oligonucleotides, containing test sequences of interest, are added 
to the assay mixture. This allows the determination of which test sequences 
compete most effectively with the radiolabeled test oligonucleotide for binding 
distamycin. 

Assay mixtures are prepared as described in Example 10, using a high- 
ranking test oligonucleotide, e.g., TTCC (ranking = #1), as the radiolabeled 
oligonucleotide in the experiment. The test oligonucleotide TTCC is labelled 
to high specific activity with y- 32 P-ATP as described in Example 8: in this 
example, the labeled TTCC oligonucleotide will be referred to as the "high 
specific activity test oligonucleotide". 

The competitor oligonucleotides are labeled as described in Example 8, 
except that the ATP used for kinasing the primer is 1:200 radiolabeled: - 
nonradiolabeled. In other words, the _competitor oligonucleotides are tracer 
labeled with "radioactive phosphorous " to a~ 2 00- fold "lower specific activity than 
the high specific activity test oligonucleotide. Since all of the competitor 
oligonucleotides are labeled with the same radiolabeled primer molecule, the 
relative concentrations of the competitor DNAs can be determined with high 
accuracy. Further, since the specific activity is the same, the concentrations 
can be adjusted to be the same. For the purposes of this example, the 
competitor DNAs are referred to as "low specific activity competitor oligonu- 
cleotides." 

The use of competitor DNAs for which the concentration is known is 
important for the competition experiment. The accuracy of the competition 
assay may be further enhanced by separating any unincorporated radiolabeled 
primer from the double stranded competitor oligonucleotides. This separation 
can be achieved using, for example, a 6-20% polyacrylamide gel. The gel is 
then exposed to x-ray film and the amount of double-stranded oligonucleotide 
determined by use of a scanning laser densitometer, essentially as described in 
the Examples above. 

The competition assay is performed as described in Example 10, except 
that competitor DNAs are added in increasing relative concentration to the high 
specific activity test oligonucleotide. The DNA concentration ( [ DNA] ) is held 
constant and the UL9 concentration { [UL9] ) and distamycin concentration 
([distamycin]) are as described in Example 10. The components in the 
competition assay samples are as follows. 

Controls : 



UL9 + TTCC*; UL9 + TTCC* + Competitors; UL9 + TTCC* + distamycin; 
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Test samples: 

UL9 + TTCC* + distamycin + Competitors; 
where UL9 is UL9-C00H, TTCC* is the high specific activity test oligonucleo- 
5 tide, and Competitors are the low specific activity competitor oligonucleo- 
tides . 

TTCC-low (the tracer-labeled low specific activity competitor) competes 
with TTCC* on an equimolar basis for the binding of both protein and 
distamycin. A competitor molecule with lower affinity for distamycin than TTCC 
10 requires a higher molar ratio to TTCC* to compete for distamycin binding. The 
competition for protein between all competitors is constant. Only the 
competition for distamycin varies; the variability is due to the differential 
affinity of the competitor oligonucleotides for distamycin. The concentration 
of competitor used in these experiments varies over a range of concentrations 
15 and is determined empirically by (a) the test molecule concentration, and {b) 
the relative affinity of the competitor and the radiolabeled test oligonucleo- 
tide. Typically, the competitor DNA consists of only the test sequence, that 
is, no additional sequences are connected to the test sequence. 

The competition assay described here facilitates the determination of 
20 actual rank between the test oligonucleotides that are detected as highly 
effective molecules in the original assay. The competition assay also 
facilitates the detection of false negatives. As described above, the results 
of the assay discussed in Example 10 imply "directional" binding of distamycin, 
in which the effect of binding is only detected when the molecule is bound in 
2-5 one direction with respect to the UL9 protein. Binding in the opposite 
direction (i.e., to the complementary test sequence) is not detected with the 
same activity in the assay. 
~fi The purpose of this competition experiment is to use the test oligonucle- 

otides to. compete for the binding of distamycin. If the sequences 
30 complementary to the "best binders" are false negatives in the assay, they 
i 4H should nonetheless be effective competitors in the competition assay. 

Lij 

= • Example 15 

~f A Method of Selecting Target Sequences 

; |= 3 5 From Database Sequence Information 

pji The binding of a drug or other DNA-binding molecule to the recognition 

sequence for TFIID, or other selected transcription factors, is expected to 
£ alter the transcriptional activity of the associated gene. TATA-boxes, which 

H are the recognition sequences for the transcriptional regulatory factor TFIID, 

4 0 are- associated with most eukaryotic promoters and are critical for the 
L expression of most eukaryotic genes. Targeting a DNA-binding drug to TATA 

boxes in general would be undesirable. However, sequences flanking TATA box 
yj sequences are typically unique between genes. By targeting such flanking se- 

ij quences, perhaps with one base overlapping the TFIID recognition site, each 

Q 4 5 gene can be targeted with specificity using the novel DNA-binding .molecules 
pt " designed from the data generated from the DNA-binding drug assay. One method 

for determining novel and specific target sequences for novel DNA-binding drugs 
is described here. The method may be applied to any known binding site for any 
specific transcription factor, regardless of whether the identity of the 
50 transcription factor itself is known. 

TATA-boxes have been determined for a large number of genes. Typically, 
the TATA-box consensus sequence has been identified by examining the DNA se- 
quence 5* of the RNA start site of a selected gene. However, the most rigorous 
determinations of TATA boxes have also demonstrated the transcription factor 
55 binding site by DNA protection experiments and DNA:protein binding assays 
(using electrophoretic methods) . Many of these sites are annotated in the 
public databases "EMBL" and "GENBANK" , which both contain sequences of nucleic 
acids sequences. Unfortunately, the flat field listing of these databases do 
not consistently annotate these sites. It is possible, however, to 

60 automatically search a database, using a text parsing language called AWK, to 
extract most sequence information that relates to annotated promoter sequences. 

The following is a description of how selected promoter sites were 
located in the public database from "EMBL." The flat field annotations from 
65 "EMBL" Version 32 as processed by "INTELLIGENTICS" (Mountain View, CA) , were 
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obtained with the set of UNIX programs call "IG-SUITE." These programs were 
executed on a "SUN IPX" workstation. An AWK script was used to parse all the 
primate annotation files listed in the "EMBL" database. The AWK interpreter is 
supplied as part of the system software that comes with the "SUN IPX " 
workstation. 

The following is a description of how the AWK parses annotation files 
looking for and printing information relating to promoters and TATA-boxes. The 
system is asked to examine the input files for certain key words in the header 
lines or annotations to the sequence. The AWK interpreter reads input files 
line by line and executes functions based on patterns found in each line. In 
this case, the AWK system read the annotation files of EMBL . The following is 
a description of how the AWK script can be used to parse out sequences 
containing TATA-boxes . 

The program first examines the files for all header lines containing the 
word "complete" but not "mRNA" or "pseudogene" ; the output is printed. 
Complete genes sometimes contain the promoter sequences but complete mRNA genes 
do not contain the promoters. mRNA genes are not of interest for the purpose 
of detecting promoter elements. Next, the AWK system looks for the word "exon 
1" and if it finds it prints the header and "DE" line. Then it looks for "5"' 
and prints the header line if it does not contain the word "mRNA". Next it 
looks for the word "transcription" and if it finds it prints the preceding and 
following line along with description line. 

Next, the AWK system examines the files for the word "TATA" in the header 
lines or references. This results is printed. After this it looks for the 
word "promoter" and if it finds it prints that line and the line after it which 
contains the information about the promoter. Then the program looks for 
"protein_bind" and prints that line along with the next one. The description 
of "protein_bind" is usually used to mark potential binding sites of 
transcription factors in the "EMBL" database. AWK then scans for any annotated 
primary mRNA start sites. The promoter sequence is found in front of the start 
site. Finally, any exon 1 start sites that are annotated in the feature table 
are extracted. Exon 1 start sites should also be the primary transcription 
start site and the TATA boxes usually are found approximately 25-35 base pairs 
5' to the transcriptional start site. 

The actual AWK script is included here as an example of how to parse a 
database to extract promoter sites: 
BEGIN {print_next_line=0} 
{if {print_next_line==l) 
{print $0 

print_next_line=G } - - - - - - - - 

} 

{if ($0 

{ Locus=$0 
l_flag=0 } 

} 

/ A >/ && / [Ccjomplete/ && $0 !- /mRNAImrna/ && $0 ! -/pseudogene/ {print } 

/ A >/ && /exon irO-9]/ {print} 

/ A >/ && /57 && $0 !- /mRNAImrna/ {print} 

/ [Tt] ranscription/ {print Locus "\n" PL "\n" $0;print_next_line^l } 
{if ($0 -/"FT/ && $0 -/TATA/ && $0 -/note/) 
{print Locus "\n" PL"\n"$0} 

} 

{if ($0 -/ A FT/ && $0 -/[Tt] ranscription/ && $0 -A//) 
{print Locus "\n" PL"\n"$0} 

} 

{if ($2 !- /note/ && $2 - /TATA/) {print Locus "\n" $0} } 
{if ($2 -/promoter/) 

{ print_next_line=l 
if (l_flag==0) 

{print Locus "\n" $0 
l_flag=l} 
else 

print $0 

} 

} 
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{if ($2 ~/protein_bind/) 

{print Locus "\n" $0 
print_next_line=l } 

} 

5 {if ($2 ~/prim_transcript/ && $3 ! -/ A l . . r<l . . /) 
{print Locus "\n" $0 
print_next_line=l } 

} 

{if ($0 -/"FT/ && $0 -/number=l ["0-9] /) 
10 if (PL -/exon/) {print Locus "\n" PL"\n"$0} 

} 

{PL=$0} 

After the AWK script is run on the database the output is manually 
15 examined. Those sites that are clearly promoter sites are noted and nucleotide" 
coordinates recorded. Other gene sequences are examined using the "FINDSEQ" 
program of "IG_SUITE" to see if the promoter sites can be determined or if the 
references in the database describe the promoter sequences. If so, those 
nucleotide coordinates are noted. At the end of this examination "FINDSEQ" is 
20 used to extract any sequences containing promoter sequences by using an 
indirect file of "LOCUS" names constructed using a text editor. 

A parsing program was also written to extract each of the annotated sites 
from the file that "FINDSEQ" extracted from "EMBL. " This program extracts the 
following information: the promoter site name and four numbers representing 
25 the nucleotide coordinates of where the sequence is to start, what the 
^ coordinate of the first base of the site is, the coordinate of the last base of 

W the site and the end of the sequence to be extracted. A large batch file was 

42 constructed to automatically extract each of the promoter sites. These se- 

quences formed the basis of Table V. 
A % 30 The Sequence Listing presents a number of sequences that are useful as 

test sequences in the present invention. SEQ ID NO:l to SEQ ID NO: 481 and SEQ 
-JJ ID NO: 600 correspond to promoter targets (typically/ TATA box-containing sites) 

jjj for human genes. SEQ ID NO: 4 82 to SEQ ID NO: 599 correspond to promoter targets 

"s* for viral genes. 

± 35 

Ol Example 16 

— Using Normalized Values to Determine 

i ^ Sequence Specificity and Relative Binding Affinity 

f7 A ■ The Assay Mixture and Calibrator Samples . 

■gib- 40 ~ ' ' The assay mixture is prepared as described in" Example" ~1"0. ~ The 
Li concentration of the components can be varied as described in the Detailed 

l,i Description. 

The assay mixtures containing both UL9 and DNA are incubated at room 
\*j temperature for at least 10 minutes to allow the DNA: protein complexes to form 

M 45 and for the system to come to equilibrium. At time = 0, the assay is begun by 
adding water (control samples) or test molecule (typically at 1-5 uM, test 
samples) to the assay mixtures using a 12-channel micropipettor . After 
incubation with drug for 5-120 minutes, samples are taken and applied to 
nitrocellulose filters using a 96-well dot blot apparatus {Schleicher and 
50 Schuell) held at 4°C. 

Calibrator samples are used to normalize the results between plates, that 
is, to take plate-to-plate variability into account. Calibrator samples are 
prepared using 2-fold serial dilutions of DNA in the assay mixture and 
incubating duplicate samples in one column of the 96-well assay plate. The 
55 highest concentration of DNA used is the same concentration used in the 
screening samples. In general, calibrator samples are used in all experiments. 

However, use of calibrator samples appears to be less important for 
experiments using blocked plates since the variability between blocked plates 
is lower than between unblocked plates. 
60 The calibrator samples are used to normalize the values between plates as 

follows. The volume values (Example 10) for the calibrator samples are 
obtained from densitometry. Volume values are plotted against DNA 

concentration. The plots are examined to ensure linearity. The volume values 
for the points on the calibrator line are then averaged for each plate. A 
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factor, designated the normalization factor, is then determined for each 
calibrator line. When the normalization factor is multiplied by the average of 
the points on each calibrator line, the product is the same number for all 
plates. Usually, the average of the line averages is used for determining the 
normalization factor, although in theory, any of the line average numbers can 
be used. The operating assumption in this analysis is that the differences in 
the calibrator samples reflected the differences in adsorption for each plate. 
By normalizing to the calibrator samples, these variations are minimized. 

Once the normalizing factor is obtained, all of the raw volume values for 
each of the test assays on the plate is multiplied by the normalizing factor. 
For example, if the following data were obtained, the process of normalization 
would be as follows: 



TABLE XIII 



PLATE ■ •• 




DNA CONCENTRATION , 




. NUMBER: ; ~, 














0~. 8 


0.4 


0.2 ' 


0.1' 


Average 


Plate I: 


4000 


2000 


1000 


500 


1875 


Plate II: 


4200 


2100 


1050 


525 


1969 


Plate III: 


3800 


1900 


950 


475 


1781 




Average 




1875 







Plate I has a normalization factor of 1; Plate II has a normalization 
factor of 1875/1969 = 0.95; Plate III has a normalization factor of 1875/1781 
- 1.05. The equation used to establish these numbers is as follows: "Average 
average'Vline average = normalization factor. 

If the normalization factors are different, these factors are 
incorporated into the data analysis. The sample data on each plate is then 
multiplied by the normalization factor to obtain normalized volume values. 

B. The Capture/Detection System . 

A 96-well dot blot apparatus is typically used to capture the DNA: protein 
complexes on a nitrocellulose filter as described in Example 10. 

C. Quantitation of Data . 

The autoradiographs of the nitrocellulose filters are analyzed as 
described in Example 10. 

£j— Aha lysis ~o f ~Da t a 7 " ~ ~~ ~ ~ ~ 

After densitometry, the data is analyzed using a spreadsheet program, 
such as "EXCEL." For each plate, the calibrator samples are examined and used 
to determine the normalization value. Then, for each test oligonucleotide, at 
each drug concentration and/or each time point, a normalized % score is 
calculated. The normalized % score (n%) can be described as follows: 
n% = (nT/nC) x 100, 

where (i) nT is the densitometry volume of the test sample multiplied by the 
normalization factor for the plate from which the sample was obtained, and (ii) 
nC is the densitometry volume of the control sample multiplied by the 
normalization factor for the plate from which the sample was obtained. The 
oligonucleotides are then ranked from 1 to 256 based on their n% scores. 

While the invention has been described with reference to specific methods 
and embodiments, it will be appreciated that various modifications and changes 
may be made without departing from the invention. 



